KR102636431B1

KR102636431B1 - Method of providing video skip function and apparatus performing thereof

Info

Publication number: KR102636431B1
Application number: KR1020220140194A
Authority: KR
Inventors: 권석면; 김유석
Original assignee: 주식회사 일만백만
Priority date: 2022-10-27
Filing date: 2022-10-27
Publication date: 2024-02-14
Also published as: KR20240059603A; WO2024091086A1

Abstract

본 발명의 일 실시예에 따른 영상 데이터 스킵 기능 제공 장치에서 실행되는 영상 데이터 스킵 기능 제공 방법은 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 파트로 점프 요청되거나 특정 하이라이트 검색 요청 메시지를 수신하는 단계, 미리 생성된 장면 데이터를 썸네일로 하는 복수의 하이라이트 파트를 제공하거나 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트를 제공하는 단계 및 상기 복수의 하이라이트 파트 중 특정 하이라이트 파트의 재생 또는 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트의 재생이 요청되면, 특정 하이라이트 파트에 해당하는 시점부터 동영상 데이터가 재생되도록 하는 단계를 포함한다. A method of providing a video data skip function performed in an apparatus for providing a video data skip function according to an embodiment of the present invention includes receiving a request to jump to a specific highlight part or a specific highlight search request message by a user in the process of playing video data. , providing a plurality of highlight parts using pre-generated scene data as thumbnails or providing a specific highlight part corresponding to a specific highlight search request message, and playing a specific highlight part among the plurality of highlight parts or a specific highlight search request message. When playback of a specific highlight part corresponding to is requested, the video data is played from the point corresponding to the specific highlight part.

Description

영상 데이터 스킵 기능 제공 방법 및 이를 실행하는 장치{METHOD OF PROVIDING VIDEO SKIP FUNCTION AND APPARATUS PERFORMING THEREOF}Method for providing video data skip function and device for executing the same {METHOD OF PROVIDING VIDEO SKIP FUNCTION AND APPARATUS PERFORMING THEREOF}

본 발명은 영상 데이터 스킵 기능 제공 방법 및 이를 실행하는 장치에 관한 것으로, 보다 구체적으로 영상 데이터의 재생 중 복수의 하이라이트 파트 중 사용자에 의해 선택된 하이라이트 파트로 이동되도록 하는 영상 데이터 스킵 기능 제공 방법 및 이를 실행하는 장치에 관한 것이다.The present invention relates to a method for providing a video data skip function and a device for executing the same. More specifically, a method for providing a video data skip function that moves to a highlight part selected by a user among a plurality of highlight parts during playback of video data and a method for executing the same. It is about a device that does.

방송 콘텐츠를 제공하는 방송 사업자들은, 케이블 TV나 IPTV와 같은 실시간 방송 채널 서비스뿐만 아니라, 보편적으로 주문형 비디오(VOD: Video On Demand) 서비스도 사용자에게 제공하고 있다.Broadcasting companies that provide broadcast content generally provide users with video on demand (VOD) services as well as real-time broadcast channel services such as cable TV and IPTV.

VOD 서비스를 제공하기 위해 방송 사업자들은 하나의 동영상 콘텐츠를, 통상적으로 3~10초 재생 길이를 가지는 청크 단위로 순차적으로 나누어 저장해둔다. 이들 청크들은 셋탑박스와 같은 미디어 재생 단말의 요청에 따라, 스트리밍 방식으로 사용자에게 서비스로 제공된다.To provide VOD services, broadcasters sequentially divide and store video content into chunks that typically have a playback length of 3 to 10 seconds. These chunks are provided as a service to users in a streaming manner upon request from a media playback terminal such as a set-top box.

한편 VOD 서비스는 대상 콘텐츠를 처음부터 재생하는 형태로 제공되기도 하지만, 특정 구간에 대한 되감기, 임의 위치 이동 등의 재생 제어 수단을 함께 제공하고 있다. 사용자는 언제든지 되감기 등을 통해 선호하는 장면을 반복 시청할 수 있고, 북마크 등의 기능을 통해 설정된 재생 위치로 임의 이동도 가능하다. 만약 특정 장면 구간에 대한 시청이 지속적으로 반복된다면, 해당 장면 구간은 대상 콘텐츠의 하이라이트 구간으로 인식될 수 있다.Meanwhile, VOD services are provided in the form of playing the target content from the beginning, but also provide playback control means such as rewinding a specific section and moving to a random position. Users can watch their favorite scenes repeatedly by rewinding at any time, and can also arbitrarily move to the playback position set through functions such as bookmarks. If viewing of a specific scene section is repeated continuously, the scene section may be recognized as a highlight section of the target content.

VOD 콘텐츠의 하이라이트 구간 정보는 대상 VOD 콘텐츠를 홍보하기 위한 중요 정보로서 사용될 수 있다. 예를 들어, 방송 사업자들은 복수개의 VOD 멀티스크린을 구성하고, 각 스크린마다 특정 VOD 콘텐츠의 하이라이트 구간을 제공할 수 있다. Highlight section information of VOD content can be used as important information to promote target VOD content. For example, broadcasters can configure a plurality of VOD multi-screens and provide highlight sections of specific VOD content for each screen.

사용자는 특정 VOD 멀티스크린을 선택하여 재생중인 VOD 콘텐츠의 상세 화면으로 진입할 수 있고, 구매를 통해 선택한 VOD 콘텐츠를 정상적으로 시청하게 된다. Users can select a specific VOD multi-screen to enter the detailed screen of the VOD content being played, and then watch the VOD content selected through purchase normally.

이와 같은 VOD 멀티스크린을 이용한 서비스를 제공하는 과정에서, 방송 사업자는 특정 콘텐츠의 하이라이트 구간 정보를 수작업을 통해 편성하고 설정하고 있다. In the process of providing such VOD multi-screen services, broadcasters manually organize and set highlight section information for specific content.

하이라이트 구간 정보의 수작업 설정 방식은 VOD 서비스 사용자들의 실제 VOD 시청 패턴에 따른 선호 구간 정보를 동적으로 반영하지 못하고, 편성 운영인력 투입에 따른 인건비 부담을 감당해야 하는 문제를 안고 있다.The manual setting method of highlight section information does not dynamically reflect preferred section information according to the actual VOD viewing patterns of VOD service users, and has the problem of having to bear the burden of labor costs due to the input of programming operation personnel.

본 발명은 영상 데이터의 재생 중 복수의 하이라이트 파트 중 사용자에 의해 선택된 하이라이트 파트로 이동되도록 하는 영상 데이터 스킵 기능 제공 방법 및 이를 실행하는 장치를 제공하는 것을 목적으로 한다. The purpose of the present invention is to provide a method for providing a video data skip function that moves to a highlight part selected by a user among a plurality of highlight parts during playback of video data, and a device for executing the same.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the objects mentioned above, and other objects and advantages of the present invention that are not mentioned can be understood by the following description and will be more clearly understood by the examples of the present invention. Additionally, it will be readily apparent that the objects and advantages of the present invention can be realized by the means and combinations thereof indicated in the patent claims.

이러한 목적을 달성하기 위한 영상 데이터 스킵 기능 제공 장치에서 실행되는 영상 데이터 스킵 기능 제공 방법은 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 파트로 점프 요청되거나 특정 하이라이트 검색 요청 메시지를 수신하는 단계, 미리 생성된 장면 데이터를 썸네일로 하는 복수의 하이라이트 파트를 제공하거나 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트를 제공하는 단계 및 상기 복수의 하이라이트 파트 중 특정 하이라이트 파트의 재생 또는 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트의 재생이 요청되면, 특정 하이라이트 파트에 해당하는 시점부터 동영상 데이터가 재생되도록 하는 단계를 포함한다. To achieve this purpose, a method of providing a video data skip function executed in a video data skip function providing device includes the steps of requesting a jump to a specific highlight part by a user or receiving a specific highlight search request message in the process of playing video data; Providing a plurality of highlight parts using generated scene data as thumbnails or providing a specific highlight part corresponding to a specific highlight search request message, and playing a specific highlight part among the plurality of highlight parts or corresponding to a specific highlight search request message When playback of a specific highlight part is requested, the video data is played from the point corresponding to the specific highlight part.

또한 이러한 목적을 달성하기 위한 영상 데이터 스킵 기능 제공 장치는 영상 데이터를 순차적으로 또는 하이라이트 파트에 해당하는 시점부터 재생하는 영상 데이터 제공부, 상기 영상 데이터가 분할되어 생성되며 서로 다른 종류의 태그가 할당된 장면 데이터가 저장되어 있는 장면 데이터베이스, 상기 영상 데이터를 장면 단위로 분할하여 장면 데이터를 생성하고 상기 장면 데이터에 서로 다른 종류의 태그를 할당하여 상기 장면 데이터베이스에 저장하는 하이라이트 파트 생성부 및 상기 영상 데이터 제공부가 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 파트로 점프 요청되거나 특정 하이라이트 검색 요청 메시지를 수신하면, 상기 장면 데이터베이스에 저장된 장면 데이터를 썸네일로 하는 복수의 하이라이트 파트를 제공하거나 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트를 제공하고, 상기 복수의 하이라이트 파트 중 특정 하이라이트 파트의 재생 또는 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트의 재생이 요청되면, 상기 영상 데이터 제공부를 통해 특정 하이라이트 파트에 해당하는 시점부터 동영상 데이터가 재생되도록 하는 영상 데이터 스킵 기능 제공부를 포함한다.In addition, a video data skip function providing device to achieve this purpose includes a video data providing unit that plays video data sequentially or from a point corresponding to a highlight part, the video data is divided and generated, and different types of tags are assigned. A scene database in which scene data is stored, a highlight part generator that divides the image data into scene units to generate scene data, assigns different types of tags to the scene data and stores them in the scene database, and provides the image data. In the process of playing additional video data, when the user requests to jump to a specific highlight part or receives a specific highlight search request message, a plurality of highlight parts using the scene data stored in the scene database as thumbnails are provided or a specific highlight search request message is provided. provides a specific highlight part corresponding to the plurality of highlight parts, and when playback of a specific highlight part among the plurality of highlight parts or playback of a specific highlight part corresponding to a specific highlight search request message is requested, the video data provider corresponds to the specific highlight part It includes a video data skip function providing unit that allows video data to be played from the starting point.

전술한 바와 같은 본 발명에 의하면, 영상 데이터의 재생 중 복수의 하이라이트 파트 중 사용자에 의해 선택된 하이라이트 파트로 이동될 수 있다는 장점이 있다. According to the present invention as described above, there is an advantage that the highlight part selected by the user among a plurality of highlight parts can be moved during playback of video data.

도 1은 본 발명의 일 실시예에 따른 영상 데이터 스킵 기능 제공 시스템을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 영상 데이터 스킵 기능 제공 장치의 내부 구조를 설명하기 위한 블록도이다.
도 3은 본 발명에 따른 영상 데이터 스킵 기능 제공 방법의 일 실시예를 설명하기 위한 흐름도이다.
도 4 내지 7은 본 발명의 일 실시예에 따른 영상 데이터 스킵 기능 제공 장치를 설명하기 위한 도면이다. 1 is a diagram for explaining a system for providing an image data skip function according to an embodiment of the present invention.
Figure 2 is a block diagram for explaining the internal structure of an apparatus for providing an image data skip function according to an embodiment of the present invention.
Figure 3 is a flowchart to explain an embodiment of a method for providing a video data skip function according to the present invention.
4 to 7 are diagrams for explaining an apparatus for providing an image data skip function according to an embodiment of the present invention.

전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다. 도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용된다.The above-mentioned objects, features, and advantages will be described in detail later with reference to the attached drawings, so that those skilled in the art will be able to easily implement the technical idea of the present invention. In describing the present invention, if it is determined that a detailed description of known technologies related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the attached drawings. In the drawings, identical reference numerals are used to indicate identical or similar components.

도 1은 본 발명의 일 실시예에 따른 영상 데이터 스킵 기능 제공 시스템을 설명하기 위한 도면이다.1 is a diagram for explaining a system for providing an image data skip function according to an embodiment of the present invention.

도 1을 참조하면, 영상 데이터 스킵 기능 제공 시스템은 영상 데이터 스킵 기능 제공 장치(100), 동영상 자동 생성 장치(200), 참조 영상 데이터 추천 장치(300), 고객 단말(400_1~400_N) 및 사용자 단말(500_1~500_N)을 포함한다. Referring to FIG. 1, the image data skip function providing system includes an image data skip function providing device 100, an automatic video generating device 200, a reference video data recommendation device 300, customer terminals (400_1 to 400_N), and a user terminal. Includes (500_1~500_N).

영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터의 재생 중 복수의 하이라이트 파트 중 사용자에 의해 선택된 하이라이트 파트로 이동되도록 하는 장치이다.The video data skip function providing device 100 is a device that moves video data to a highlight part selected by the user among a plurality of highlight parts during playback.

이를 위해, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터를 장면 단위로 분할하여 장면 데이터를 생성하고, 장면 데이터 각각에 태그를 할당한 후 장면 데이터베이스에 저장한다. To this end, the image data skip function providing device 100 divides the image data into scene units to generate scene data, assigns tags to each scene data, and stores them in the scene database.

먼저, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터를 장면 단위로 분할하여 장면 데이터를 생성한다. First, the video data skip function providing device 100 divides the video data into scene units to generate scene data.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터로부터 이미지로 디코딩한 후 재생 시간 간격으로 이미지를 샘플링할 수 있다. In one embodiment, the video data skip function providing device 100 may decode video data into an image and then sample the image at playback time intervals.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 샘플링된 이미지 중 서로 인접한 이미지의 유사도에 기초하여 샘플링된 이미지를 장면 단위로 그룹핑하여 장면 데이터를 생성할 수 있다. 여기에서, 인접한 이미지는 샘플링된 이미지를 영상이 재생되는 시간 순서대로 나열하였을 때 이웃하는 이미지를 의미할 수 있다.In the above embodiment, the apparatus 100 for providing the image data skip function may generate scene data by grouping the sampled images into scene units based on the similarity of adjacent images among the sampled images. Here, adjacent images may mean neighboring images when sampled images are arranged in the order of video playback time.

예를 들어, 영상 데이터 스킵 기능 제공 장치(100)는 인접한 이미지에 대하여 피쳐 매칭(Feature Matching)을 수행하여 이미지의 유사도를 연산할 수 있다. 가령, 영상 데이터 스킵 기능 제공 장치(100)는 인접한 이미지의 특징점을 대조하여 소정 정도 이상의 유사도를 보이는 이미지를 하나의 장면 데이터로 그룹핑하여 장면 데이터를 생성할 수 있다. For example, the device 100 for providing the image data skip function may calculate the similarity of images by performing feature matching on adjacent images. For example, the device 100 for providing the image data skip function may generate scene data by comparing feature points of adjacent images and grouping images showing a certain degree of similarity or more into one scene data.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 샘플링된 이미지 각각에서 추출되는 오브젝트의 개수 변화를 산출하고, 오브젝트의 개수 변화에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 장면 데이터를 생성할 수 있다. In the above embodiment, the video data skip function providing device 100 calculates the change in the number of objects extracted from each sampled image, determines that the scene has changed according to the change in the number of objects, and provides scene data based on that point in time. can be created.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 샘플링된 이미지에 대해서 동일한 픽셀의 픽셀 값 변화를 이용하여 배경 이미지가 변화하였는지 여부를 판단하고, 판단 결과에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 장면 데이터를 생성할 수 있다. In the above embodiment, the video data skip function providing device 100 determines whether the background image has changed using the pixel value change of the same pixel for the sampled image, and determines that the scene has changed according to the determination result. Scene data can be created based on that point in time.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터를 구성하는 음성 데이터 및 자막 데이터를 기초로 새로운 내용이 표시되는 시점을 새로운 장면이라고 판단하여 장면 데이터를 생성할 수 있다. In the above embodiment, the video data skip function providing device 100 may determine a time when new content is displayed as a new scene based on the audio data and subtitle data constituting the video data and generate scene data.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 샘플링된 이미지 각각에서 추출되는 오브젝트를 추출하고, 오브젝트가 사라졌거나 새로운 오브젝트가 나타나면 새로운 장면이라고 판단하여 장면 데이터를 생성할 수 있다. In the above embodiment, the apparatus 100 for providing the image data skip function may extract an object extracted from each sampled image, determine that it is a new scene when an object disappears or a new object appears, and generate scene data.

또한, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터를 분석하여 장면 데이터 각각에 태그를 할당한다. Additionally, the video data skip function providing device 100 analyzes the scene data and assigns a tag to each scene data.

이를 위해, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터의 특징을 추출하여 장면 데이터의 특징 정보를 추출하고 특정 정보에 따라 서로 다른 종류의 태그를 할당한다. To this end, the video data skip function providing device 100 extracts characteristics of the scene data, extracts characteristic information of the scene data, and assigns different types of tags according to the specific information.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당한다. In one embodiment, the device 100 for providing an image data skip function extracts feature information of an object included in scene data, expresses the feature information of the object as a vector value, generates feature information of the object, and generates feature information of the object. Assign object attribute tags according to.

보다 구체적으로, 영상 데이터 스킵 기능 제공 장치(100)는 오브젝트의 특징 영역을 탐지(Interest Point Detection)할 수 있다. 여기에서, 특징 영역이란, 오브젝트들 사이의 동일 유사 여부를 판단하기 위한 오브젝트의 특징에 대한 기술자, 즉 특징 기술자(Feature Descriptor)를 추출하는 주요 영역을 말한다. More specifically, the device 100 for providing the image data skip function can detect a feature area of an object (Interest Point Detection). Here, the feature area refers to the main area where a feature descriptor, that is, a descriptor for the characteristics of an object for determining whether or not objects are identical or similar, is extracted.

본 발명의 실시예에 따르면 이러한 특징 영역은 오브젝트가 포함하고 있는 윤곽선, 윤곽선 중에서도 코너 등의 모퉁이, 주변 영역과 구분되는 블롭(blob), 장면 데이터의 변형에 따라 불변하거나 공변하는 영역, 또는 주변 밝기보다 어둡거나 밝은 특징이 있는 극점일 수 있으며 장면 데이터의 패치(조각) 또는 장면 데이터 전체를 대상으로 할 수 있다. According to an embodiment of the present invention, these feature areas include the outline included in the object, corners such as corners among the outlines, blobs distinguished from the surrounding area, areas that are invariant or co-variable according to the transformation of the scene data, or surrounding brightness. It can be a pole with darker or brighter features and can target patches (pieces) of scene data or the entire scene data.

다른 일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터의 특징 영역에서 특징 기술자를 추출(Descriptor Extraction)하고, 특징 기술자에 따라 장면 데이터에서 화면 속성 태그를 할당할 수 있다. 특징 기술자는 장면 데이터의 특징들을 벡터 값으로 표현한 것이다. In another embodiment, the device 100 for providing an image data skip function may extract a feature descriptor from a feature area of scene data and allocate a screen attribute tag from the scene data according to the feature descriptor. A feature descriptor expresses the features of scene data as vector values.

상기의 이러한 특징 기술자는 장면 데이터에 대한 특징 영역의 위치, 또는 특징 영역의 밝기, 색상, 선명도, 그라디언트, 스케일 또는 패턴 정보를 이용하여 계산할 수 있다. 예를 들어 특징 기술자는 특징 영역의 밝기 값, 밝기의 변화 값 또는 분포 값 등을 벡터로 변환하여 계산할 수도 있다. These feature descriptors can be calculated using the location of the feature area in the scene data, or the brightness, color, sharpness, gradient, scale, or pattern information of the feature region. For example, the feature descriptor may calculate the brightness value, brightness change value, or distribution value of the feature area by converting it into a vector.

한편, 본 발명의 실시예에 따르면 장면 데이터에 대한 특징 기술자는 위와 같이 특징 영역에 기반한 지역 기술자(Local Descriptor) 뿐 아니라, 전역 기술자(Global descriptor), 빈도 기술자(Frequency Descriptor), 바이너리 기술자(Binary Descriptor) 또는 신경망 기술자(Neural Network descriptor)로 표현될 수 있다. Meanwhile, according to an embodiment of the present invention, the feature descriptor for scene data includes not only a local descriptor based on the feature area as above, but also a global descriptor, frequency descriptor, and binary descriptor. ) or can be expressed as a neural network descriptor.

보다 구체적으로, 특징 기술자는 장면 데이터의 전체 또는 장면 데이터를 임의의 기준으로 분할한 구역 각각, 또는 특징 영역 각각의 밝기, 색상, 선명도, 그라디언트, 스케일, 패턴 정보 등을 벡터값으로 변환하여 추출하는 전역 기술자(Global descriptor)를 포함할 수 있다. More specifically, the feature descriptor extracts the brightness, color, sharpness, gradient, scale, pattern information, etc. of the entire scene data, each section divided by an arbitrary standard, or each feature area, by converting them into vector values. May include global descriptor.

예를 들어, 특징 기술자는 미리 구분한 특정 기술자들이 장면 데이터에 포함되는 횟수, 종래 정의된 색상표와 같은 전역적 특징의 포함 횟수 등을 벡터값으로 변환하여 추출하는 빈도 기술자 (Frequency Descriptor), 각 기술자들의 포함 여부 또는 기술자를 구성하는 각 요소 값들의 크기가 특정값 보다 크거나 작은지 여부를 비트 단위로 추출한 뒤 이를 정수형으로 변환하여 사용하는 바이너리 기술자 (Binary descriptor), 신경망(Neural Network)의 레이어에서 학습 또는 분류를 위해 사용되는 영상 정보를 추출하는 신경망 기술자(Neural Network descriptor)를 포함할 수 있다. For example, a feature descriptor is a frequency descriptor that extracts the number of times pre-classified specific descriptors are included in scene data, the number of times global features such as a conventionally defined color table are included, etc., by converting them into vector values. Binary descriptor, layer of neural network that extracts in bits whether the descriptors are included or whether the size of each element value constituting the descriptor is larger or smaller than a specific value and then converts it to an integer type and uses it. It may include a neural network descriptor that extracts image information used for learning or classification.

또 다른 일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출하고, 상황의 종류에 따라 상황 속성 태그를 할당한다. 이때, 장면 종류는 각 장면에서 표현되는 상황의 종류를 의미한다. In another embodiment, the video data skip function providing device 100 trains scene data in a scene type analysis model to extract the type of situation expressed in the scene and assigns a situation attribute tag according to the type of situation. At this time, the scene type refers to the type of situation expressed in each scene.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 종류 분석 모델을 CNN 딥 러닝 모델(CNN Deep Learning Model)로 구축하고, 상술한 데이터 셋을 학습할 수 있다. 이때, CNN 딥 러닝 모델은 두 개의 컨볼루션 레이어, 렐루 레이어, 맥스 풀링 레이어 및 하나의 풀리 커넥티드 레이어를 포함하도록 설계될 수 있다. In the above embodiment, the device 100 for providing the image data skip function may build a scene type analysis model as a CNN Deep Learning Model and learn the above-described data set. At this time, the CNN deep learning model can be designed to include two convolutional layers, a relu layer, a max pooling layer, and one fully connected layer.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 RCNN 기법을 활용하여 CNN에서 산출된 컨볼루션 피쳐 맵(Convolution Feature Maps)의 맵 순서대로 피쳐 시퀀스(Feature Sequence)를 구성한 후, 각 피쳐 시퀀스를 롱 숏 텀 메모리 네트워크(LSTM; Long Short Term Memory networks)에 대입하여 학습할 수 있다.In the above embodiment, the apparatus 100 for providing the image data skip function configures a feature sequence in map order of convolution feature maps calculated from CNN using the RCNN technique, and then configures each feature. Sequences can be learned by substituting them into Long Short Term Memory networks (LSTM).

또 다른 일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터에서 하이라이트 파트를 추출하고, 하이라이트 파트에 해당하는 장면 데이터에 하이라이트 속성 태그를 할당한다. 이때, 하이라이트 파트는 영상 데이터에서 추출된 일부 구간을 의미할 수 있고, 이는 영상 데이터가 직접 지정한 구간이거나 혹은 자동 추출되는 구간일 수 있다. In another embodiment, the video data skip function providing device 100 extracts a highlight part from video data and assigns a highlight attribute tag to scene data corresponding to the highlight part. At this time, the highlight part may mean a partial section extracted from the video data, and this may be a section directly designated by the video data or a section automatically extracted.

그 후, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터 중 장면 데이터가 추출된 부분을 하이라이트 파트로 결정하여 해당 파트부터 영상 데이터가 재생되도록 한다. 즉, 본 명세서에서 장면 데이터가 추출된 부분은 영상 데이터 중 하이라이트 파트를 지시하는 북마크로 사용된다. 따라서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공할 수 있는 것이다. Thereafter, the video data skip function providing device 100 determines the part from which the scene data is extracted from the video data as the highlight part and plays the video data from that part. That is, in this specification, the part from which the scene data is extracted is used as a bookmark that indicates the highlight part of the video data. Accordingly, the video data skip function providing device 100 can provide highlight parts using scene data as thumbnails.

또한, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트로 점프 요청되거나 특정 하이라이트 검색 요청 메시지가 수신되면, 하이라이트 파트 중 사용자에 의해 요청된 하이라이트 파트로 이동되도록 한다.In addition, when the video data skip function providing device 100 is requested to jump to a specific highlight by the user or a specific highlight search request message is received in the process of playing video data, the device moves to the highlight part requested by the user among the highlight parts. do.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터를 재생하는 과정에서 사용자에 의해 스킵 요청이 수신되면 복수의 하이라이트 파트를 제공하여 특정 하이라이트 파트를 선택받아 선택된 특정 하이라이트로 이동되도록 한다. In one embodiment, the video data skip function providing device 100 provides a plurality of highlight parts when a skip request is received by the user in the process of playing video data, selects a specific highlight part, and moves to the selected specific highlight. .

다른 일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 검색 요청 메시지를 수신하면 특정 하이라이트 검색 요청 메시지에 해당하는 하이라이트 파트로 이동시킨다. 이때, 특정 하이라이트 검색 요청 메시지는 검색 텍스트로 구성될 수 있다. In another embodiment, when the video data skip function providing device 100 receives a specific highlight search request message from a user in the process of playing video data, the device moves the video data to the highlight part corresponding to the specific highlight search request message. At this time, the specific highlight search request message may consist of search text.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 특정 하이라이트 검색 요청 메시지에서 검색 텍스트를 추출하고, 검색 텍스트를 공백을 기준으로 단어를 추출하고, 미리 생성된 단어 별 빈도 수 데이터베이스를 기초로 단어의 빈도 수를 측정한다. In the above embodiment, the video data skip function providing device 100 extracts search text from a specific highlight search request message, extracts words based on spaces in the search text, and bases the database on the frequency of each word created in advance. Measure the frequency of words.

그런 다음, 영상 데이터 스킵 기능 제공 장치(100)는 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성한다. Then, the image data skip function providing device 100 performs morphological analysis on each word to generate a token in which a word and a morpheme value are paired and a label indicating the frequency is assigned.

예를 들어, 영상 데이터 스킵 기능 제공 장치(100)는 검색 텍스트를 분석하여 (빈도 수: 1000번, (단어, 형태소 값)), (빈도 수: 234번, (단어, 형태소)), (빈도수: 2541번, (단어, 형태소)), (빈도수: 2516번, (단어, 형태소)) … 등의 토큰을 생성할 수 있다. For example, the device 100 for providing the video data skip function analyzes the search text to obtain (frequency number: 1000, (word, morpheme value)), (frequency number: 234, (word, morpheme)), (frequency number: : No. 2541, (word, morpheme)), (Frequency: No. 2516, (word, morpheme)) … You can create tokens such as:

상기와 같이 영상 데이터 스킵 기능 제공 장치(100)는 토큰을 생성한 후 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여한다. As described above, the video data skip function providing device 100 generates tokens and then assigns different weights to each token according to the word of the token and the label of the token.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 토큰 각각에 대해서 해당 토큰의 단어를 구현하는 언어의 종류(즉, 영어, 중국어, 한국어 등), 단어가 스크립트의 텍스트에서 존재하는 위치 및 토큰에 할당된 레이블의 빈도 수에 따라 서로 다른 가중치를 부여한다. In one embodiment, the video data skip function providing device 100 provides, for each token, the type of language that implements the word of the token (i.e., English, Chinese, Korean, etc.), the location where the word exists in the text of the script, and Different weights are assigned depending on the frequency of the label assigned to the token.

먼저, 영상 데이터 스킵 기능 제공 장치(100)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수 및 각각의 토큰의 순서를 이용하여 제1 가중치를 산출한다. First, the video data skip function providing device 100 calculates a first weight using the total number of tokens generated from the text of the script and the order of each token.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수를 기준으로 토큰의 순서가 어느 정도인지 여부 및 언어의 종류에 따라 미리 결정된 중요 값에 제1 가중치를 산출할 수 있다. In one embodiment, the video data skip function providing device 100 applies a first weight to a predetermined important value depending on the order of the tokens and the type of language based on the total number of tokens generated from the text of the script. can be calculated.

예를 들어, 영상 데이터 스킵 기능 제공 장치(100)는 전체 토큰의 개수가 12개 이고 토큰의 순서가 4번째인 경우, “0.25”를 산출하고, 언어의 종류에 따라 미리 결정된 중요 값을 반영하여 제1 가중치를 산출할 수 있다. For example, if the total number of tokens is 12 and the token order is 4th, the video data skip function providing device 100 calculates “0.25” and reflects the important value predetermined according to the type of language. The first weight can be calculated.

이때, 언어의 종류에 따라 미리 결정된 중요 값은 언어의 종류 별로 중요한 단어가 어느 위치에 나타내는지 여부에 따라 변경될 수 있다. 즉, 언어의 종류에 따라 미리 결정된 중요 값은 현재 토큰의 번호에 따라 변경될 수 있다At this time, the important value predetermined according to the type of language may be changed depending on where the important word appears for each type of language. In other words, important values predetermined depending on the type of language can be changed depending on the number of the current token.

다른 일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 스크립트의 텍스트에서 생성된 토큰 각각에 대해서 토큰에 미리 할당된 레이블이 지시하는 빈도 수와 이전 토큰 및 다음 토큰 각각에 미리 할당된 레이블이 지시하는 빈도 수를 이용하여 제2 가중치를 산출할 수 있다. In another embodiment, the video data skip function providing device 100 provides, for each token generated from the text of the script, the frequency indicated by the label pre-assigned to the token and the label pre-assigned to each of the previous token and the next token. The second weight can be calculated using the indicated frequency.

그 후, 영상 데이터 스킵 기능 제공 장치(100)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다. 상기와 같이, 영상 데이터 스킵 기능 제공 장치(100)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다.Afterwards, the video data skip function providing device 100 assigns a final weight using the first weight and the second weight. As described above, the video data skip function providing device 100 assigns a final weight using the first weight and the second weight.

그런 다음, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그를 비교하여 유사 점수를 산출하고, 유사 점수가 특정 점수 이상인 태그가 할당된 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. Then, the video data skip function providing device 100 calculates a similarity score by comparing tags that match the morpheme value of the token among a plurality of tags of the scene data, and scene data to which a tag with a similarity score of a certain score or more is assigned is assigned. It provides a highlight part with a thumbnail.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터베이스(130)에서 추출된 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 장면 데이터를 추출하고, 추출된 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In one embodiment, the image data skip function providing device 100 extracts scene data to which a tag matching the morpheme value of the token is assigned among a plurality of tags of scene data extracted from the scene database 130, and extracts the extracted scene data. If a match is made between the tag of the scene data and the word of the token, a highlight part using the corresponding scene data as a thumbnail is provided.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 토큰의 형태소 값이 명사인 경우 장면 데이터베이스(130)에 추출된 장면 데이터의 복수의 태그 중 오브젝트 속성 태그 및 토큰의 단어를 매칭시켜 일치하면 해당 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In the above embodiment, the image data skip function providing device 100 matches the object attribute tag and the word of the token among the plurality of tags of the scene data extracted from the scene database 130 when the morpheme value of the token is a noun. When you do this, a highlight part with the relevant scene data as a thumbnail is provided.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 토큰의 형태소 값이 형용사인 경우 장면 데이터베이스(130)에 추출된 장면 데이터의 복수의 태그 중 화면 속성 태그 및 상황 속성 태그와 토큰의 단어를 매칭시켜 일치하면 해당 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In the above embodiment, when the morpheme value of the token is an adjective, the image data skip function providing device 100 uses the screen attribute tag, the situation attribute tag, and the word of the token among the plurality of tags of the scene data extracted in the scene database 130. If there is a match, a highlight part with the relevant scene data as a thumbnail is provided.

다른 일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터베이스(130)에서 추출된 장면 데이터의 복수의 태그 중 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 장면 데이터에 대해서, 장면 데이터의 복수의 태그 및 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하고, 유사 비율이 특정 점수 이상인 태그가 할당된 장면 데이터를 추출하여 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In another embodiment, the device 100 for providing the image data skip function may provide scene data to which a tag that does not match the morpheme value of the token among the plurality of tags of the scene data extracted from the scene database 130 is assigned. A similarity rate is calculated by matching a plurality of tags in the data and the words in the token, and scene data assigned to a tag with a similarity rate higher than a certain score is extracted to provide a highlight part using the scene data as a thumbnail.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하고, 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링 수를 기준으로 상기 일치하는 문자의 개수의 비율에 따라 상기 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 장면 데이터를 추출하여 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In the above embodiment, the video data skip function providing device 100 matches a plurality of tags of scene data and characters constituting each word of the token to calculate the number of matching characters, and By comparing the number of strings and the number of strings corresponding to the word of the token, the similarity ratio is calculated according to the ratio of the number of matching characters based on the number of longer strings, and a tag with the similarity ratio higher than a certain score is assigned. Extracts scene data and provides highlight parts with scene data as thumbnails.

동영상 자동 생성 장치(200)는 고객의 요청에 따라 영상을 자동으로 생성한다. 먼저, 동영상 자동 생성 장치(200)는 고객 단말(400_1~400_N)로부터 수신된 영상 생성 참조 정보를 이용하여 스크립트를 생성한다. The automatic video generation device 200 automatically generates videos according to customer requests. First, the automatic video generation device 200 generates a script using video generation reference information received from customer terminals 400_1 to 400_N.

일 실시예에서, 동영상 자동 생성 장치(200)는 고객 단말(400_1~400_N)로부터 수신된 영상 생성 참조 정보가 단어 단위의 키워드인 경우 미리 생성된 스크립트 데이터베이스에서 키워드에 해당하는 오브젝트 속성, 오브젝트와 매칭되는 장면의 화면 속성 및 오브젝트와 매칭되는 장면의 상황 속성을 이용하여 스크립트를 생성할 수 있다. In one embodiment, the automatic video generation device 200 matches the object properties and objects corresponding to the keyword in a pre-generated script database when the video generation reference information received from the customer terminals 400_1 to 400_N is a keyword in word units. You can create a script using the screen properties of the scene and the situation properties of the scene that matches the object.

상기의 실시예에서, 동영상 자동 생성 장치(200)는 키워드에 해당하는 오브젝트 속성, 오브젝트와 매칭되는 장면의 화면 속성 및 오브젝트와 매칭되는 장면의 상황 속성 중 고객과 관련된 컨텐츠를 이용한 사용자의 행동 정보를 기초로 결정된 속성과 매칭하는 텍스트를 이용하여 스크립트를 생성할 수 있다. In the above embodiment, the automatic video generation device 200 collects user behavior information using content related to the customer among object properties corresponding to keywords, screen properties of scenes matching the object, and situation properties of scenes matching the object. You can create a script using text that matches the attributes determined on the basis.

그 후, 동영상 자동 생성 장치(200)는 상기 스크립트를 기초로 기준 장면 데이터로 구성된 시나리오를 생성한 후 상기 스크립트에서 키워드를 추출한다.Afterwards, the automatic video generation device 200 generates a scenario composed of reference scene data based on the script and then extracts keywords from the script.

보다 구체적으로, 동영상 자동 생성 장치(200)는 기준 장면 데이터의 스크립트의 텍스트를 공백을 기준으로 단어를 추출하고, 미리 생성된 단어 별 빈도 수 데이터베이스를 기초로 단어의 빈도 수를 측정한다. More specifically, the automatic video generation device 200 extracts words based on spaces in the text of the script of the standard scene data and measures the frequency of the words based on a pre-generated word frequency database.

그런 다음, 동영상 자동 생성 장치(200)는 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성한다. Then, the automatic video generation device 200 performs morphological analysis on each word to generate a token in which a word and a morpheme value are paired and a label indicating the frequency is assigned.

예를 들어, 동영상 자동 생성 장치(200)는 스크립트의 텍스트를 분석하여 (빈도 수: 1000번, (단어, 형태소 값)), (빈도 수: 234번, (단어, 형태소)), (빈도수: 2541번, (단어, 형태소)), (빈도수: 2516번, (단어, 형태소)) … 등의 토큰을 생성할 수 있다. For example, the automatic video generation device 200 analyzes the text of the script to generate (frequency number: 1000 times, (word, morpheme value)), (frequency number: 234 times, (word, morpheme value)), (frequency number: No. 2541, (word, morpheme)), (Frequency: No. 2516, (word, morpheme)) … You can create tokens such as:

상기와 같이 동영상 자동 생성 장치(200)는 토큰을 생성한 후 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여한다. As described above, the automatic video generating device 200 generates tokens and then assigns different weights to each token according to the word of the token and the label of the token.

일 실시예에서, 동영상 자동 생성 장치(200)는 토큰 각각에 대해서 해당 토큰의 단어를 구현하는 언어의 종류(즉, 영어, 중국어, 한국어 등), 단어가 스크립트의 텍스트에서 존재하는 위치 및 토큰에 할당된 레이블의 빈도 수에 따라 서로 다른 가중치를 부여한다. In one embodiment, the automatic video generation device 200 determines, for each token, the type of language that implements the word in the token (i.e., English, Chinese, Korean, etc.), the position in the text of the script where the word exists, and the type of language that implements the word in the token. Different weights are assigned depending on the frequency of the assigned label.

먼저, 동영상 자동 생성 장치(200)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수 및 각각의 토큰의 순서를 이용하여 제1 가중치를 산출한다. First, the automatic video generation device 200 calculates a first weight using the total number of tokens generated from the text of the script and the order of each token.

일 실시예에서, 동영상 자동 생성 장치(200)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수를 기준으로 토큰의 순서가 어느 정도인지 여부 및 언어의 종류에 따라 미리 결정된 중요 값에 제1 가중치를 산출할 수 있다. In one embodiment, the automatic video generation device 200 calculates a first weight to a predetermined important value depending on the order of the tokens and the type of language based on the total number of tokens generated from the text of the script. can do.

예를 들어, 동영상 자동 생성 장치(200)는 전체 토큰의 개수가 12개 이고 토큰의 순서가 4번째인 경우, “0.25”를 산출하고, 언어의 종류에 따라 미리 결정된 중요 값을 반영하여 제1 가중치를 산출할 수 있다. For example, if the total number of tokens is 12 and the order of the token is 4th, the automatic video generating device 200 calculates “0.25” and reflects the important value predetermined according to the type of language to create the first Weights can be calculated.

다른 일 실시예에서, 동영상 자동 생성 장치(200)는 스크립트의 텍스트에서 생성된 토큰 각각에 대해서 토큰에 미리 할당된 레이블이 지시하는 빈도 수와 이전 토큰 및 다음 토큰 각각에 미리 할당된 레이블이 지시하는 빈도 수를 이용하여 제2 가중치를 산출할 수 있다. In another embodiment, the automatic video generation device 200 provides, for each token generated from the text of the script, the frequency indicated by the label pre-assigned to the token and the frequency indicated by the label pre-assigned to each of the previous token and the next token. The second weight can be calculated using the frequency count.

그 후, 동영상 자동 생성 장치(200)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다. 상기와 같이, 동영상 자동 생성 장치(200)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다.Afterwards, the automatic video generation device 200 assigns a final weight using the first weight and the second weight. As described above, the automatic video generation device 200 assigns a final weight using the first weight and the second weight.

동영상 자동 생성 장치(200)는 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 참조 영상 데이터 추천 장치(300)에 제공하고, 참조 영상 데이터 추천 장치(300)로부터 참조 영상 데이터를 수신한다. The automatic video generation device 200 provides a reference video data recommendation request message including keywords composed of tokens with different weights to the reference video data recommendation device 300, and receives the reference video data recommendation device 300 from the reference video data recommendation device 300. Receive reference image data.

그 후, 동영상 자동 생성 장치(200)는 참조 영상 데이터 추천 장치(300)로부터 추출된 참조 장면 데이터 및 미리 생성된 환경 데이터를 합성하여 영상 데이터를 생성한다. Thereafter, the automatic video generation device 200 generates image data by combining the reference scene data extracted from the reference image data recommendation device 300 and the previously generated environment data.

이를 위해, 동영상 자동 생성 장치(200)는 시나리오에 따라 음향 데이터를 선택하고, 상기 시나리오에 해당하는 텍스트 데이터를 음성 데이터로 변환하고, 상기 시나리오에 따라 AI 배우를 생성할 수 있다. To this end, the automatic video generation device 200 may select sound data according to a scenario, convert text data corresponding to the scenario into voice data, and generate an AI actor according to the scenario.

참조 영상 데이터 추천 장치(300)는 고객의 요청에 따라 동영상을 자동으로 생성하기 위해서 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하고, 참조 장면 데이터 각각에 태그를 할당한 후 참조 장면 데이터베이스에 저장한다. In order to automatically generate a video according to the customer's request, the reference video data recommendation device 300 collects video data, divides the video data into scenes, creates reference scene data, and assigns tags to each reference scene data. and then save it to the reference scene database.

먼저, 참조 영상 데이터 추천 장치(300)는 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성한다. First, the reference image data recommendation device 300 collects image data and then divides the image data into scenes to generate reference scene data.

일 실시예에서, 참조 영상 데이터 추천 장치(300)는 영상 데이터로부터 이미지로 디코딩한 후 재생 시간 간격으로 이미지를 샘플링할 수 있다. In one embodiment, the reference video data recommendation device 300 may decode video data into an image and then sample the image at playback time intervals.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지 중 서로 인접한 이미지의 유사도에 기초하여 샘플링된 이미지를 장면 단위로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. 여기에서, 인접한 이미지는 샘플링된 이미지를 영상이 재생되는 시간 순서대로 나열하였을 때 이웃하는 이미지를 의미할 수 있다.In the above embodiment, the reference image data recommendation device 300 may generate reference scene data by grouping the sampled images into scenes based on the similarity of adjacent images among the sampled images. Here, adjacent images may mean neighboring images when sampled images are arranged in the order of video playback time.

예를 들어, 참조 영상 데이터 추천 장치(300)는 인접한 이미지에 대하여 피쳐 매칭(Feature Matching)을 수행하여 이미지의 유사도를 연산할 수 있다. 가령, 동영상 자동 생성 장치(200)는 인접한 이미지의 특징점을 대조하여 소정 정도 이상의 유사도를 보이는 이미지를 하나의 장면으로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. For example, the reference image data recommendation device 300 may calculate the similarity of images by performing feature matching on adjacent images. For example, the automatic video generation device 200 may generate reference scene data by comparing feature points of adjacent images and grouping images showing a certain degree of similarity or more into one scene.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지 각각에서 추출되는 오브젝트의 개수 변화를 산출하고, 오브젝트의 개수 변화에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference image data recommendation device 300 calculates the change in the number of objects extracted from each sampled image, determines that the scene has changed according to the change in the number of objects, and recommends the reference scene data based on that point in time. can be created.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지에픽대해서 동일한 픽셀의 픽셀 값 변화를 이용하여 배경 이미지가 변화하였는지 여부를 판단하고, 판단 결과에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference video data recommendation device 300 determines whether the background image has changed using the pixel value change of the same pixel for the sampled image epic, determines that the scene has changed according to the determination result, and determines that the scene has changed. Reference scene data can be created based on the viewpoint.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 영상 데이터를 구성하는 음성 데이터 및 자막 데이터를 기초로 새로운 내용이 표시되는 시점을 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference video data recommendation device 300 may determine a time when new content is displayed as a new scene based on the audio data and subtitle data constituting the video data and generate reference scene data.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지 각각에서 추출되는 오브젝트를 추출하고, 오브젝트가 사라졌거나 새로운 오브젝트가 나타나면 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference image data recommendation device 300 may extract an object extracted from each sampled image, determine that it is a new scene when an object disappears or a new object appears, and generate reference scene data.

또한, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터를 분석하여 참조 장면 데이터 각각에 태그를 할당한다. Additionally, the reference image data recommendation device 300 analyzes the reference scene data and assigns a tag to each reference scene data.

이를 위해, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 특징을 추출하여 참조 장면 데이터의 특징 정보를 추출하고 특정 정보에 따라 서로 다른 종류의 태그를 할당한다. To this end, the reference image data recommendation device 300 extracts features of the reference scene data, extracts feature information of the reference scene data, and assigns different types of tags according to the specific information.

일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당한다. In one embodiment, the reference image data recommendation device 300 extracts feature information of an object included in reference scene data, expresses the feature information of the object as a vector value, generates feature information of the object, and generates feature information of the object. Assign object attribute tags according to.

보다 구체적으로, 참조 영상 데이터 추천 장치(300)는 오브젝트의 특징 영역을 탐지(Interest Point Detection)할 수 있다. 여기에서, 특징 영역이란, 오브젝트들 사이의 동일 유사 여부를 판단하기 위한 오브젝트의 특징에 대한 기술자, 즉 특징 기술자(Feature Descriptor)를 추출하는 주요 영역을 말한다. More specifically, the reference image data recommendation device 300 may detect a feature area of an object (Interest Point Detection). Here, the feature area refers to the main area where a feature descriptor, that is, a descriptor for the characteristics of an object for determining whether or not objects are identical or similar, is extracted.

본 발명의 실시예에 따르면 이러한 특징 영역은 오브젝트가 포함하고 있는 윤곽선, 윤곽선 중에서도 코너 등의 모퉁이, 주변 영역과 구분되는 블롭(blob), 참조 장면 데이터의 변형에 따라 불변하거나 공변하는 영역, 또는 주변 밝기보다 어둡거나 밝은 특징이 있는 극점일 수 있으며 참조 장면 데이터의 패치(조각) 또는 참조 장면 데이터 전체를 대상으로 할 수 있다. According to an embodiment of the present invention, these feature areas include the outline included in the object, a corner such as a corner among the outlines, a blob distinguished from the surrounding area, an area that is invariant or covariant depending on the transformation of the reference scene data, or the surrounding area. It can be a pole with features that are darker or brighter than the brightness, and can target a patch (piece) of reference scene data or the entire reference scene data.

다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 특징 영역에서 특징 기술자를 추출(Descriptor Extraction)하고, 특징 기술자에 따라 참조 장면 데이터에서 화면 속성 태그를 할당할 수 있다. 특징 기술자는 참조 장면 데이터의 특징들을 벡터 값으로 표현한 것이다. In another embodiment, the reference image data recommendation apparatus 300 may extract a feature descriptor from a feature area of the reference scene data and allocate a screen attribute tag from the reference scene data according to the feature descriptor. A feature descriptor expresses the features of reference scene data as vector values.

상기의 이러한 특징 기술자는 참조 장면 데이터에 대한 특징 영역의 위치, 또는 특징 영역의 밝기, 색상, 선명도, 그라디언트, 스케일 또는 패턴 정보를 이용하여 계산할 수 있다. 예를 들어 특징 기술자는 특징 영역의 밝기 값, 밝기의 변화 값 또는 분포 값 등을 벡터로 변환하여 계산할 수도 있다. These feature descriptors can be calculated using the location of the feature area relative to the reference scene data, or the brightness, color, sharpness, gradient, scale, or pattern information of the feature region. For example, the feature descriptor may calculate the brightness value, brightness change value, or distribution value of the feature area by converting it into a vector.

한편, 본 발명의 실시예에 따르면 참조 장면 데이터에 대한 특징 기술자는 위와 같이 특징 영역에 기반한 지역 기술자(Local Descriptor) 뿐 아니라, 전역 기술자(Global descriptor), 빈도 기술자(Frequency Descriptor), 바이너리 기술자(Binary Descriptor) 또는 신경망 기술자(Neural Network descriptor)로 표현될 수 있다. Meanwhile, according to an embodiment of the present invention, the feature descriptor for reference scene data includes not only a local descriptor based on the feature area as above, but also a global descriptor, frequency descriptor, and binary descriptor. Descriptor) or Neural Network descriptor.

보다 구체적으로, 특징 기술자는 참조 장면 데이터의 전체 또는 참조 장면 데이터를 임의의 기준으로 분할한 구역 각각, 또는 특징 영역 각각의 밝기, 색상, 선명도, 그라디언트, 스케일, 패턴 정보 등을 벡터값으로 변환하여 추출하는 전역 기술자(Global descriptor)를 포함할 수 있다. More specifically, the feature descriptor converts the brightness, color, sharpness, gradient, scale, pattern information, etc. of the entire reference scene data, each area where the reference scene data is divided based on an arbitrary standard, or each feature area into vector values. A global descriptor to be extracted may be included.

예를 들어, 특징 기술자는 미리 구분한 특정 기술자들이 참조 장면 데이터에 포함되는 횟수, 종래 정의된 색상표와 같은 전역적 특징의 포함 횟수 등을 벡터값으로 변환하여 추출하는 빈도 기술자 (Frequency Descriptor), 각 기술자들의 포함 여부 또는 기술자를 구성하는 각 요소 값들의 크기가 특정값 보다 크거나 작은지 여부를 비트 단위로 추출한 뒤 이를 정수형으로 변환하여 사용하는 바이너리 기술자 (Binary descriptor), 신경망(Neural Network)의 레이어에서 학습 또는 분류를 위해 사용되는 영상 정보를 추출하는 신경망 기술자(Neural Network descriptor)를 포함할 수 있다. For example, a feature descriptor is a frequency descriptor that extracts the number of times specific pre-classified descriptors are included in reference scene data, the number of times global features such as a conventionally defined color table are included, etc., by converting them into vector values. Binary descriptor and neural network that extract whether each descriptor is included or whether the size of each element value constituting the descriptor is larger or smaller than a specific value in bits and then convert it to an integer type. The layer may include a neural network descriptor that extracts image information used for learning or classification.

또 다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출하고, 상황의 종류에 따라 상황 속성 태그를 할당한다. 이때, 장면 종류는 각 장면에서 표현되는 상황의 종류를 의미한다. In another embodiment, the reference image data recommendation device 300 trains reference scene data in a scene type analysis model to extract the type of situation expressed in the scene and assigns a situation attribute tag according to the type of situation. At this time, the scene type refers to the type of situation expressed in each scene.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 장면 종류 분석 모델을 CNN 딥 러닝 모델(CNN Deep Learning Model)로 구축하고, 상술한 데이터 셋을 학습할 수 있다. 이때, CNN 딥 러닝 모델은 두 개의 컨볼루션 레이어, 렐루 레이어, 맥스 풀링 레이어 및 하나의 풀리 커넥티드 레이어를 포함하도록 설계될 수 있다. In the above embodiment, the reference image data recommendation device 300 may build a scene type analysis model as a CNN Deep Learning Model and learn the above-described data set. At this time, the CNN deep learning model can be designed to include two convolutional layers, a relu layer, a max pooling layer, and one fully connected layer.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 RCNN 기법을 활용하여 CNN에서 산출된 컨볼루션 피쳐 맵(Convolution Feature Maps)의 맵 순서대로 피쳐 시퀀스(Feature Sequence)를 구성한 후, 각 피쳐 시퀀스를 롱 숏 텀 메모리 네트워크(LSTM; Long Short Term Memory networks)에 대입하여 학습할 수 있다.In the above embodiment, the reference image data recommendation device 300 uses the RCNN technique to construct a feature sequence in map order of the convolution feature maps calculated from the CNN, and then configures each feature sequence. It can be learned by substituting it into Long Short Term Memory networks (LSTM).

또 다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 영상 데이터에서 하이라이트 부분을 추출하고, 하이라이트 부분에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당한다. 이때, 하이라이트 부분은 영상 데이터에서 추출된 일부 구간을 의미할 수 있고, 이는 영상 데이터가 직접 지정한 구간이거나 혹은 자동 추출되는 구간일 수 있다. In another embodiment, the reference image data recommendation device 300 extracts a highlight portion from the image data and assigns a highlight attribute tag to the reference scene data corresponding to the highlight portion. At this time, the highlight portion may mean a partial section extracted from the video data, and this may be a section directly designated by the video data or a section automatically extracted.

그 후, 참조 영상 데이터 추천 장치(300)는 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하면, 참조 영상 데이터 추천 요청 메시지를 기초로 참조 장면 데이터베이스에서 참조 장면 데이터를 추출하여 동영상 자동 생성 장치(200)에 제공한다. Thereafter, when the reference image data recommendation device 300 receives a reference image data recommendation request message containing keywords composed of tokens with different weights, the reference image data recommendation request message is referred to in the reference scene database based on the reference image data recommendation request message. Scene data is extracted and provided to the automatic video creation device 200.

먼저, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그를 비교하여 유사 점수를 산출하고, 유사 점수가 특정 점수 이상인 태그가 할당된 참조 장면 데이터를 장면 데이터베이스(130)에서 추출한다.First, the reference image data recommendation device 300 calculates a similarity score by comparing tags that match the morpheme value of the token among a plurality of tags in the reference scene data, and references scene data to which a tag with a similarity score of a certain score or more is assigned. is extracted from the scene database 130.

일 실시예에서, 참조 영상 데이터 추천 장치(300)는 장면 데이터베이스(130)에서 추출된 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 참조 장면 데이터를 추출하고, 추출된 참조 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In one embodiment, the reference image data recommendation device 300 extracts reference scene data to which a tag matching the morpheme value of the token is assigned among a plurality of tags of reference scene data extracted from the scene database 130, and extracts The tag of the reference scene data and the word of the token are matched, and if they match, the corresponding reference scene data is extracted and provided.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 토큰의 형태소 값이 명사인 경우 장면 데이터베이스(130)에 추출된 참조 장면 데이터의 복수의 태그 중 오브젝트 속성 태그 및 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In the above embodiment, if the morpheme value of the token is a noun, the reference image data recommendation device 300 matches the object attribute tag and the word of the token among the plurality of tags of the reference scene data extracted from the scene database 130 to make a match. Then, the corresponding reference scene data is extracted and provided.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 토큰의 형태소 값이 형용사인 경우 장면 데이터베이스(130)에 추출된 참조 장면 데이터의 복수의 태그 중 화면 속성 태그 및 상황 속성 태그와 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In the above embodiment, when the morpheme value of the token is an adjective, the reference image data recommendation device 300 uses the screen attribute tag and the situation attribute tag among the plurality of tags of the reference scene data extracted in the scene database 130 and the word of the token. If there is a match, the corresponding reference scene data is extracted and provided.

다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 장면 데이터베이스(130)에서 추출된 참조 장면 데이터의 복수의 태그 중 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 참조 장면 데이터에 대해서, 참조 장면 데이터의 복수의 태그 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하고, 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공할 수 있다.In another embodiment, the reference image data recommendation device 300 is configured to: For the reference scene data to which a tag that does not match the morpheme value of the token among the plurality of tags of the reference scene data extracted from the scene database 130 is assigned, A similarity ratio can be calculated by matching words in tokens of a plurality of tags of reference scene data, and reference image data assigned to tags with a similarity ratio of a certain score or higher can be extracted and provided.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하고, 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링수를 기준으로 상기 일치하는 문자의 개수의 비율에 따라 상기 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공할 수 있다. In the above embodiment, the reference image data recommendation device 300 calculates the number of matching characters by matching a plurality of tags of the reference scene data and the letters constituting each word of the token, and By comparing the number of strings and the number of strings corresponding to the word of the token, the similarity ratio is calculated according to the ratio of the number of matching characters based on the number of longer strings, and a tag with the similarity ratio higher than a certain score is assigned. Reference image data can be extracted and provided.

고객 단말(400_1~400_N)은 웹 서비스 제공 서버에 접속하기 위한 어플리케이션이 설치되어 있다. 따라서, 고객 단말(400_1~400_N)은 어플리케이션이 선택되어 실행되면, 어플리케이션을 통해 해당 영상 데이터 스킵 기능 제공 장치(100)에 접속할 수 있다. 고객 단말(400_1~400_N)은 영상 데이터 스킵 기능 제공 장치(100)에 영상 생성 참조 정보를 제공하여 영상의 자동 생성을 요청한다. Customer terminals (400_1 to 400_N) have an application installed to connect to the web service provision server. Accordingly, when the application is selected and executed, the customer terminals 400_1 to 400_N can access the corresponding video data skip function providing device 100 through the application. The customer terminals 400_1 to 400_N provide video generation reference information to the video data skip function providing device 100 and request automatic generation of the video.

사용자 단말(500_1~500_N)은 웹 서비스 제공 서버(200)에 접속하기 위한 어플리케이션이 설치되어 있다. 따라서, 사용자 단말(500_1~500_N)은 어플리케이션이 선택되어 실행되면, 어플리케이션을 통해 해당 웹 서비스 제공 서버에 접속할 수 있다. The user terminals 500_1 to 500_N have an application installed to access the web service providing server 200. Accordingly, when the application is selected and executed, the user terminals (500_1 to 500_N) can access the corresponding web service providing server through the application.

사용자 단말(500_1~500_N)은 어플리케이션을 통해 웹 서비스 제공 서버(200)에서 제공되는 웹 페이지를 표시할 수 있다. 이때, 웹 페이지는 사용자의 스크롤에 따라 화면에 즉시 표시될 수 있도록 전자장치에 로딩된 화면 및/또는 상기 화면 내부의 컨텐츠 등을 포함한다. User terminals 500_1 to 500_N may display web pages provided by the web service providing server 200 through an application. At this time, the web page includes a screen loaded on the electronic device and/or content inside the screen so that it can be immediately displayed on the screen as the user scrolls.

예를 들어, 사용자 단말(500_1~500_N)의 어플리케이션 상에서 웹 페이지가 표시된 상태에서 수평 또는 수직 방향으로 길게 연장되어 사용자의 스크롤에 따라 표시되는 어플리케이션의 실행 화면 전체가 상기 웹 페이지의 개념에 포함될 수 있으며, 카메라 롤 중인 화면 역시 상기 웹 페이지의 개념에 포함될 수 있다. For example, while a web page is displayed on an application of a user terminal (500_1 to 500_N), the entire application execution screen that extends horizontally or vertically and is displayed as the user scrolls may be included in the concept of the web page. , the camera roll screen can also be included in the concept of the web page.

또한, 사용자 단말(500_1~500_N)에는 사용자 관심사 분석을 위한 어플리케이션(예를 들어, 소프트웨어, 신경망 모델 등)이 설치되어 있다. 따라서, 사용자 단말(500_1~500_N)은 로그 기록 또는 인게이지먼트 기록을 수집한 후, 사용자 관심사 분석을 위한 어플리케이션를 통해 로그 기록 또는 인게이지먼트 기록을 분석하여 사용자의 취향을 결정할 수 있다. Additionally, an application (eg, software, neural network model, etc.) for analyzing user interests is installed in the user terminals 500_1 to 500_N. Accordingly, the user terminals 500_1 to 500_N may collect log records or engagement records and then determine the user's taste by analyzing the log records or engagement records through an application for analyzing user interests.

일 실시예에서, 사용자 단말(500_1~500_N)은 로그 기록 또는 인게이지먼트 기록을 분석하여 사용자의 행동 정보를 추출하고, 사용자의 행동 정보로부터 컨텐츠의 종류를 결정하기 위한 레이블을 추출할 수 있다. In one embodiment, the user terminals 500_1 to 500_N may extract user behavior information by analyzing log records or engagement records, and extract a label for determining the type of content from the user behavior information.

다른 일 실시예에서, 사용자 단말(500_1~500_N)은 크롤러, 파서, 인덱서를 구비하여, 사용자가 열람하는 웹 페이지를 수집하고, 웹 페이지에 포함된 이미지 및 아이템명, 가격 등 텍스트 정보에 접근하여 컨텐츠의 종류를 결정하기 위한 레이블을 추출할 수 있다. In another embodiment, the user terminals 500_1 to 500_N are equipped with a crawler, parser, and indexer to collect web pages viewed by the user, and access text information such as images, item names, and prices included in the web page. A label to determine the type of content can be extracted.

예를 들어, 크롤러는 사용자가 열람하는 웹 주소 목록을 수집하고, 웹사이트를 확인하여 링크를 추적하는 방식으로 아이템 정보와 관련된 데이터를 수집한다. 이때, 파서는 크롤링 과정 중에 수집된 웹 페이지를 해석하여 페이지에 포함된 이미지, 아이템 가격, 아이템명 등 아이템 정보를 추출하며, 인덱서는 해당 위치와 의미를 색인할 수 있다. For example, crawlers collect data related to item information by collecting a list of web addresses that users browse, checking websites, and tracking links. At this time, the parser interprets the web pages collected during the crawling process and extracts item information such as images, item prices, and item names included in the page, and the indexer can index the location and meaning.

상기 컨텐츠의 종류를 결정하기 위한 레이블은 사용자 행동 정보에 포함되는 사용자가 열람한 컨텐츠(예를 들어, 웹 브라우저), 좋아요 태그를 생성한 컨텐츠(예를 들어, 소셜 네트워크)의 이미지, 사용자가 열람한 홈페이지의 이미지 및 텍스트를 기초로 해당 아이템의 의미를 의미한다. The label for determining the type of content includes the content viewed by the user included in user behavior information (e.g., web browser), the image of the content that generated a like tag (e.g., social network), and the image viewed by the user. It refers to the meaning of the item based on the images and text on a homepage.

사용자 단말(500_1~500_N)에는 사용자 열람 기록이 저장되어 있다. 사용자 열람 기록은 로그 기록 및 인게이지먼트 기록을 포함한다. 이때, 로그 기록은 사용자가 사용자 단말(500_1~500_N)의 운영 체제 또는 소프트웨어가 실행 중에 발생하는 이벤트를 기록하여 생성된다. User browsing records are stored in user terminals (500_1 to 500_N). User viewing records include log records and engagement records. At this time, the log record is created by the user recording events that occur while the operating system or software of the user terminals (500_1 to 500_N) is running.

사용자 단말(500_1~500_N)에는 사용자 열람 기록을 기초로 추출된 컨텐츠의 종류를 결정하기 위한 레이블이 저장되어 있다. 상기 컨텐츠의 종류를 결정하기 위한 레이블은 사용자 행동 정보에 포함되는 사용자가 열람한 컨텐츠(예를 들어, 웹 브라우저), 좋아요 태그를 생성한 컨텐츠(예를 들어, 소셜 네트워크)의 이미지, 사용자가 열람한 홈페이지의 이미지 및 텍스트를 기초로 해당 아이템의 의미를 의미한다. User terminals (500_1 to 500_N) store labels for determining the type of content extracted based on user browsing records. The label for determining the type of content includes the content viewed by the user included in user behavior information (e.g., web browser), the image of the content that generated a like tag (e.g., social network), and the image viewed by the user. It refers to the meaning of the item based on the images and text on a homepage.

웹 서비스 제공 서버는 사용자 단말(500_1~500_N)이 어플리케이션을 통해 접속하면 어플리케이션의 종류에 따라 서로 다른 컨텐츠를 제공하는 서버이다. 이러한 웹 서비스 제공 서버(300_1~300_N)는 온라인 쇼핑몰 서버, 검색 엔진 서버 등으로 구현될 수 있다.The web service providing server is a server that provides different content depending on the type of application when user terminals (500_1 to 500_N) access it through an application. These web service providing servers (300_1 to 300_N) may be implemented as online shopping mall servers, search engine servers, etc.

도 2는 본 발명의 일 실시예에 따른 영상 데이터 스킵 기능 제공 장치의 내부 구조를 설명하기 위한 도면이다. Figure 2 is a diagram for explaining the internal structure of an apparatus for providing an image data skip function according to an embodiment of the present invention.

도 2를 참조하면, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터 제공부(110), 하이라이트 파트 생성부(120), 장면 데이터베이스(130) 및 영상 데이터 스킵 기능 제공부(140)를 포함한다. Referring to FIG. 2, the image data skip function providing device 100 includes an image data providing unit 110, a highlight part generating unit 120, a scene database 130, and an image data skip function providing unit 140. .

영상 데이터 제공부(110)는 동영상 자동 생성 장치(200)로부터 수신된 동영상 데이터 또는 사용자에 의해 요청된 동영상 데이터를 제공한다. The video data provider 110 provides video data received from the automatic video generation device 200 or video data requested by the user.

이러한 영상 데이터 제공부(110)는 동영상 데이터를 순차적으로 제공하는 과정에서 영상 데이터 스킵 기능 제공부(140)에 의해 제공된 하이라이트 파트의 재생이 요청되면 하이라이트 파트에 해당하는 시점부터 동영상 데이터가 재생되도록 한다. When playback of the highlight part provided by the video data skip function provider 140 is requested in the process of sequentially providing video data, the video data provider 110 causes the video data to be played from the point corresponding to the highlight part. .

하이라이트 파트 생성부(120)는 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하고, 참조 장면 데이터 각각에 태그를 할당한 후 장면 데이터베이스(130)에 저장한다. The highlight part generator 120 divides the image data into scene units to generate reference scene data, assigns tags to each reference scene data, and stores them in the scene database 130.

먼저, 하이라이트 파트 생성부(120)는 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성한다. First, the highlight part generator 120 divides the image data into scenes and generates reference scene data.

일 실시예에서, 하이라이트 파트 생성부(120)는 영상 데이터로부터 이미지로 디코딩한 후 재생 시간 간격으로 이미지를 샘플링할 수 있다. In one embodiment, the highlight part generator 120 may decode video data into an image and then sample the image at playback time intervals.

상기의 실시예에서, 하이라이트 파트 생성부(120)는 샘플링된 이미지 중 서로 인접한 이미지의 유사도에 기초하여 샘플링된 이미지를 장면 단위로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. 여기에서, 인접한 이미지는 샘플링된 이미지를 영상이 재생되는 시간 순서대로 나열하였을 때 이웃하는 이미지를 의미할 수 있다.In the above embodiment, the highlight part generator 120 may generate reference scene data by grouping the sampled images into scenes based on the similarity of adjacent images among the sampled images. Here, adjacent images may mean neighboring images when sampled images are arranged in the order of video playback time.

예를 들어, 하이라이트 파트 생성부(120)는 인접한 이미지에 대하여 피쳐 매칭(Feature Matching)을 수행하여 이미지의 유사도를 연산할 수 있다. 가령, 영상 데이터 스킵 기능 제공 장치(100)는 인접한 이미지의 특징점을 대조하여 소정 정도 이상의 유사도를 보이는 이미지를 하나의 장면 데이터로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. For example, the highlight part generator 120 may calculate the similarity of images by performing feature matching on adjacent images. For example, the device 100 for providing the image data skip function may generate reference scene data by comparing feature points of adjacent images and grouping images showing a certain degree of similarity or more into one scene data.

상기의 실시예에서, 하이라이트 파트 생성부(120)는 샘플링된 이미지 각각에서 추출되는 오브젝트의 개수 변화를 산출하고, 오브젝트의 개수 변화에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the highlight part generator 120 calculates the change in the number of objects extracted from each sampled image, determines that the scene has changed according to the change in the number of objects, and generates reference scene data based on that point in time. can be created.

상기의 실시예에서, 하이라이트 파트 생성부(120)는 샘플링된 이미지에 대해서 동일한 픽셀의 픽셀 값 변화를 이용하여 배경 이미지가 변화하였는지 여부를 판단하고, 판단 결과에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the highlight part generator 120 determines whether the background image has changed using the pixel value change of the same pixel for the sampled image, and determines that the scene has changed according to the determination result at the corresponding point in time. Reference scene data can be created based on .

상기의 실시예에서, 하이라이트 파트 생성부(120)는 영상 데이터를 구성하는 음성 데이터 및 자막 데이터를 기초로 새로운 내용이 표시되는 시점을 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the highlight part generator 120 may determine a time when new content is displayed as a new scene based on the audio data and subtitle data constituting the video data and generate reference scene data.

상기의 실시예에서, 하이라이트 파트 생성부(120)는 샘플링된 이미지 각각에서 추출되는 오브젝트를 추출하고, 오브젝트가 사라졌거나 새로운 오브젝트가 나타나면 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the highlight part generator 120 may extract an object extracted from each sampled image, determine that it is a new scene when an object disappears or a new object appears, and generate reference scene data.

또한, 하이라이트 파트 생성부(120)는 참조 장면 데이터를 분석하여 참조 장면 데이터 각각에 태그를 할당한다. Additionally, the highlight part generator 120 analyzes the reference scene data and assigns a tag to each reference scene data.

이를 위해, 하이라이트 파트 생성부(120)는 참조 장면 데이터의 특징을 추출하여 참조 장면 데이터의 특징 정보를 추출하고 특정 정보에 따라 서로 다른 종류의 태그를 할당한다. To this end, the highlight part generator 120 extracts feature information of the reference scene data and assigns different types of tags according to the specific information.

일 실시예에서, 하이라이트 파트 생성부(120)는 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당한다. In one embodiment, the highlight part generator 120 extracts feature information of an object included in reference scene data, expresses the feature information of the object as a vector value, generates feature information of the object, and adds the feature information of the object to the feature information of the object. Assign object attribute tags accordingly.

보다 구체적으로, 하이라이트 파트 생성부(120)는 오브젝트의 특징 영역을 탐지(Interest Point Detection)할 수 있다. 여기에서, 특징 영역이란, 오브젝트들 사이의 동일 유사 여부를 판단하기 위한 오브젝트의 특징에 대한 기술자, 즉 특징 기술자(Feature Descriptor)를 추출하는 주요 영역을 말한다. More specifically, the highlight part generator 120 may detect a feature area of an object (Interest Point Detection). Here, the feature area refers to the main area where a feature descriptor, that is, a descriptor for the characteristics of an object for determining whether or not objects are identical or similar, is extracted.

다른 일 실시예에서, 하이라이트 파트 생성부(120)는 참조 장면 데이터의 특징 영역에서 특징 기술자를 추출(Descriptor Extraction)하고, 특징 기술자에 따라 참조 장면 데이터에서 화면 속성 태그를 할당할 수 있다. 특징 기술자는 참조 장면 데이터의 특징들을 벡터 값으로 표현한 것이다. In another embodiment, the highlight part generator 120 may extract a feature descriptor from a feature area of the reference scene data and allocate a screen attribute tag from the reference scene data according to the feature descriptor. A feature descriptor expresses the features of reference scene data as vector values.

또 다른 일 실시예에서, 하이라이트 파트 생성부(120)는 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출하고, 상황의 종류에 따라 상황 속성 태그를 할당한다. 이때, 장면 종류는 각 장면에서 표현되는 상황의 종류를 의미한다. In another embodiment, the highlight part generator 120 trains reference scene data in a scene type analysis model to extract the type of situation expressed in the scene and assigns a situation attribute tag according to the type of situation. At this time, the scene type refers to the type of situation expressed in each scene.

상기의 실시예에서, 하이라이트 파트 생성부(120)는 장면 종류 분석 모델을 CNN 딥 러닝 모델(CNN Deep Learning Model)로 구축하고, 상술한 데이터 셋을 학습할 수 있다. 이때, CNN 딥 러닝 모델은 두 개의 컨볼루션 레이어, 렐루 레이어, 맥스 풀링 레이어 및 하나의 풀리 커넥티드 레이어를 포함하도록 설계될 수 있다. In the above embodiment, the highlight part generator 120 may build a scene type analysis model as a CNN Deep Learning Model and learn the above-described data set. At this time, the CNN deep learning model can be designed to include two convolutional layers, a relu layer, a max pooling layer, and one fully connected layer.

상기의 실시예에서, 하이라이트 파트 생성부(120)는 RCNN 기법을 활용하여 CNN에서 산출된 컨볼루션 피쳐 맵(Convolution Feature Maps)의 맵 순서대로 피쳐 시퀀스(Feature Sequence)를 구성한 후, 각 피쳐 시퀀스를 롱 숏 텀 메모리 네트워크(LSTM; Long Short Term Memory networks)에 대입하여 학습할 수 있다.In the above embodiment, the highlight part generator 120 uses the RCNN technique to construct a feature sequence in the map order of the convolution feature maps calculated from the CNN, and then constructs each feature sequence. It can be learned by substituting for Long Short Term Memory networks (LSTM).

또 다른 일 실시예에서, 하이라이트 파트 생성부(120)는 영상 데이터에서 하이라이트 파트를 추출하고, 하이라이트 파트에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당한다. 이때, 하이라이트 파트는 영상 데이터에서 추출된 일부 구간을 의미할 수 있고, 이는 영상 데이터가 직접 지정한 구간이거나 혹은 자동 추출되는 구간일 수 있다. In another embodiment, the highlight part generator 120 extracts a highlight part from image data and assigns a highlight attribute tag to reference scene data corresponding to the highlight part. At this time, the highlight part may mean a partial section extracted from the video data, and this may be a section directly designated by the video data or a section automatically extracted.

하이라이트 파트 생성부(120)는 영상 데이터 중 참조 장면 데이터가 추출된 부분을 하이라이트 파트로 결정하여 사용자에 의해 하이라이트 파트가 선택되면 해당 하이라이트 파트부터 영상 데이터가 재생되도록 한다. 즉, 본 명세서에서 참조 장면 데이터가 추출된 부분은 영상 데이터 중 하이라이트 파트를 지시하는 북마크로 사용된다. 따라서, 하이라이트 파트 생성부(120)는 참조 영상 데이터를 썸네일로 하는 하이라이트 파트를 제공할 수 있는 것이다. The highlight part generator 120 determines the part from which reference scene data is extracted from the video data as the highlight part, and when the highlight part is selected by the user, the video data is played from the highlight part. That is, in this specification, the portion from which the reference scene data is extracted is used as a bookmark indicating the highlight part of the video data. Accordingly, the highlight part generator 120 can provide a highlight part using reference image data as a thumbnail.

영상 데이터 스킵 기능 제공부(140)는 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 파트로 점프 요청되거나 특정 하이라이트 검색 요청 메시지가 수신되면, 하이라이트 파트 중 사용자에 의해 요청된 하이라이트 파트로 이동되도록 한다.The video data skip function providing unit 140 moves to the highlight part requested by the user among the highlight parts when a jump to a specific highlight part is requested by the user or a specific highlight search request message is received in the process of playing video data. .

일 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 영상 데이터를 재생하는 과정에서 사용자에 의해 스킵 요청이 수신되면 복수의 하이라이트 파트를 제공하여 특정 하이라이트 파트를 선택받아 선택된 특정 하이라이트로 이동되도록 한다. In one embodiment, the video data skip function providing unit 140 provides a plurality of highlight parts when a skip request is received by the user in the process of playing video data, selects a specific highlight part, and moves to the selected specific highlight. .

다른 일 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 검색 요청 메시지를 수신하면 특정 하이라이트 검색 요청 메시지에 해당하는 하이라이트 파트로 이동시킨다. 이때, 특정 하이라이트 검색 요청 메시지는 검색 텍스트로 구성될 수 있다. In another embodiment, when the video data skip function providing unit 140 receives a specific highlight search request message from the user in the process of playing video data, it moves to the highlight part corresponding to the specific highlight search request message. At this time, the specific highlight search request message may consist of search text.

상기의 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 특정 하이라이트 검색 요청 메시지에서 검색 텍스트를 추출하고, 검색 텍스트를 공백을 기준으로 단어를 추출하고, 미리 생성된 단어 별 빈도 수 데이터베이스를 기초로 단어의 빈도 수를 측정한다. In the above embodiment, the video data skip function providing unit 140 extracts search text from a specific highlight search request message, extracts words based on spaces in the search text, and bases the database on the frequency of each word created in advance. Measure the frequency of words.

그런 다음, 영상 데이터 스킵 기능 제공부(140)는 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성한다. Then, the image data skip function providing unit 140 performs morphological analysis on each word to generate a token consisting of a pair of word and morpheme values and assigned a label indicating the frequency.

예를 들어, 영상 데이터 스킵 기능 제공부(140)는 검색 텍스트를 분석하여 (빈도 수: 1000번, (단어, 형태소 값)), (빈도 수: 234번, (단어, 형태소)), (빈도수: 2541번, (단어, 형태소)), (빈도수: 2516번, (단어, 형태소)) … 등의 토큰을 생성할 수 있다. For example, the video data skip function providing unit 140 analyzes the search text to obtain (frequency number: 1000, (word, morpheme value)), (frequency number: 234, (word, morpheme)), (frequency number: : No. 2541, (word, morpheme)), (Frequency: No. 2516, (word, morpheme)) … You can create tokens such as:

상기와 같이 영상 데이터 스킵 기능 제공부(140)는 토큰을 생성한 후 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여한다. As described above, the video data skip function providing unit 140 generates tokens and then assigns different weights to each token according to the word of the token and the label of the token.

일 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 토큰 각각에 대해서 해당 토큰의 단어를 구현하는 언어의 종류(즉, 영어, 중국어, 한국어 등), 단어가 스크립트의 텍스트에서 존재하는 위치 및 토큰에 할당된 레이블의 빈도 수에 따라 서로 다른 가중치를 부여한다. In one embodiment, the video data skip function provider 140 provides, for each token, the type of language that implements the word of the token (i.e., English, Chinese, Korean, etc.), the location where the word exists in the text of the script, and Different weights are assigned depending on the frequency of the label assigned to the token.

먼저, 영상 데이터 스킵 기능 제공부(140)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수 및 각각의 토큰의 순서를 이용하여 제1 가중치를 산출한다. First, the video data skip function provider 140 calculates a first weight using the total number of tokens generated from the text of the script and the order of each token.

일 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수를 기준으로 토큰의 순서가 어느 정도인지 여부 및 언어의 종류에 따라 미리 결정된 중요 값에 제1 가중치를 산출할 수 있다. In one embodiment, the video data skip function provider 140 applies a first weight to a predetermined important value depending on the order of the tokens and the type of language based on the total number of tokens generated from the text of the script. can be calculated.

예를 들어, 영상 데이터 스킵 기능 제공부(140)는 전체 토큰의 개수가 12개 이고 토큰의 순서가 4번째인 경우, “0.25”를 산출하고, 언어의 종류에 따라 미리 결정된 중요 값을 반영하여 제1 가중치를 산출할 수 있다. For example, if the total number of tokens is 12 and the order of the token is 4th, the video data skip function providing unit 140 calculates “0.25” and reflects the important value predetermined according to the type of language. The first weight can be calculated.

다른 일 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 스크립트의 텍스트에서 생성된 토큰 각각에 대해서 토큰에 미리 할당된 레이블이 지시하는 빈도 수와 이전 토큰 및 다음 토큰 각각에 미리 할당된 레이블이 지시하는 빈도 수를 이용하여 제2 가중치를 산출할 수 있다. In another embodiment, the video data skip function provider 140 sets the frequency number indicated by the label pre-assigned to the token for each token generated from the text of the script and the label pre-assigned to each of the previous token and the next token. The second weight can be calculated using the indicated frequency.

그 후, 영상 데이터 스킵 기능 제공부(140)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다. 상기와 같이, 영상 데이터 스킵 기능 제공 장치(100)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다.Afterwards, the image data skip function providing unit 140 assigns a final weight using the first weight and the second weight. As described above, the video data skip function providing device 100 assigns a final weight using the first weight and the second weight.

그런 다음, 영상 데이터 스킵 기능 제공부(140)는 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그를 비교하여 유사 점수를 산출하고, 유사 점수가 특정 점수 이상인 태그가 할당된 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. Then, the video data skip function providing unit 140 calculates a similarity score by comparing tags that match the morpheme value of the token among a plurality of tags of the reference scene data, and a reference tag to which a tag with a similarity score of a certain score or more is assigned is assigned. Provides highlight parts with scene data as thumbnails.

일 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 장면 데이터베이스(130)에서 추출된 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 참조 장면 데이터를 추출하고, 추출된 참조 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In one embodiment, the image data skip function provider 140 extracts reference scene data to which a tag matching the morpheme value of the token is assigned among a plurality of tags of reference scene data extracted from the scene database 130, The tag of the extracted reference scene data and the word of the token are matched, and if they match, a highlight part using the corresponding reference scene data as a thumbnail is provided.

상기의 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 토큰의 형태소 값이 명사인 경우 장면 데이터베이스(130)에 추출된 참조 장면 데이터의 복수의 태그 중 오브젝트 속성 태그 및 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In the above embodiment, the image data skip function provider 140 matches the object attribute tag and the word of the token among the plurality of tags of the reference scene data extracted from the scene database 130 when the morpheme value of the token is a noun. If there is a match, a highlight part with the corresponding reference scene data as a thumbnail is provided.

상기의 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 토큰의 형태소 값이 형용사인 경우 장면 데이터베이스(130)에 추출된 참조 장면 데이터의 복수의 태그 중 화면 속성 태그 및 상황 속성 태그와 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In the above embodiment, when the morpheme value of the token is an adjective, the image data skip function provider 140 selects the screen attribute tag, the situation attribute tag, and the token among the plurality of tags of the reference scene data extracted from the scene database 130. If a word is matched and matches, a highlight part with the corresponding reference scene data as a thumbnail is provided.

다른 일 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 장면 데이터베이스(130)에서 추출된 참조 장면 데이터의 복수의 태그 중 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 참조 장면 데이터에 대해서, 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하고, 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In another embodiment, the image data skip function providing unit 140 is configured to provide reference scene data to which a tag that does not match the morpheme value of the token among the plurality of tags of the reference scene data extracted from the scene database 130 is assigned. , a similarity ratio is calculated by matching a plurality of tags of the reference scene data and the words of the token, and reference image data assigned to a tag with a similarity ratio of a certain score or more is extracted to provide a highlight part with the reference scene data as a thumbnail. .

상기의 실시예에서, 영상 데이터 스킵 기능 제공부(140)는 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하고, 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링 수를 기준으로 상기 일치하는 문자의 개수의 비율에 따라 상기 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공한다. In the above embodiment, the image data skip function providing unit 140 matches a plurality of tags of reference scene data and letters constituting each word of the token to calculate the number of matching letters, and corresponds to a plurality of tags. By comparing the number of strings and the number of strings corresponding to the word of the token, the similarity ratio is calculated according to the ratio of the number of matching characters based on the number of longer strings, and a tag with the similarity ratio of a certain score or more is assigned. It extracts reference image data and provides highlight parts with reference scene data as thumbnails.

도 3은 본 발명에 따른 영상 데이터 스킵 기능 제공 방법의 일 실시예를 설명하기 위한 흐름도이다.Figure 3 is a flowchart to explain an embodiment of a method for providing a video data skip function according to the present invention.

도 3을 참조하면, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터를 재생하는 과정에서(단계 S310) 사용자에 의해 특정 하이라이트 파트로 점프 요청되거나 특정 하이라이트 검색 요청 메시지를 수신한다(단계 S320).Referring to FIG. 3, the video data skip function providing device 100 receives a request to jump to a specific highlight part by the user (step S310) or a specific highlight search request message in the process of playing video data (step S320).

영상 데이터 스킵 기능 제공 장치(100)는 미리 생성된 장면 데이터를 썸네일로 하는 복수의 하이라이트 파트를 제공하거나 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트를 제공한다(단계 S330).The video data skip function providing device 100 provides a plurality of highlight parts using pre-generated scene data as thumbnails or provides a specific highlight part corresponding to a specific highlight search request message (step S330).

영상 데이터 스킵 기능 제공 장치(100)는 상기 복수의 하이라이트 파트 중 특정 하이라이트 파트의 재생 또는 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트의 재생이 요청되면, 특정 하이라이트 파트에 해당하는 시점부터 동영상 데이터가 재생되도록 한다(단계 S340).When the video data skip function providing device 100 requests playback of a specific highlight part among the plurality of highlight parts or playback of a specific highlight part corresponding to a specific highlight search request message, video data is stored from the point corresponding to the specific highlight part. Let it play (step S340).

도 4 내지 7은 본 발명의 일 실시예에 따른 영상 데이터 스킵 기능 제공 장치를 설명하기 위한 도면이다. 4 to 7 are diagrams for explaining an apparatus for providing an image data skip function according to an embodiment of the present invention.

도 4 내지 도 7을 참조하면, 영상 데이터 스킵 기능 제공 장치(100)는 고객의 요청에 따라 동영상을 자동으로 생성하기 위해서 영상 데이터(410)를 수집한 후 영상 데이터(410)를 장면 단위로 분할하여 참조 장면 데이터(420_1~420_N)를 생성하고, 참조 장면 데이터(420_1~420_N) 각각에 태그를 할당한 후 참조 장면 데이터베이스(430)에 저장한다. Referring to FIGS. 4 to 7 , the video data skip function providing device 100 collects video data 410 and then divides the video data 410 into scenes in order to automatically generate a video according to the customer's request. This generates reference scene data (420_1 to 420_N), assigns tags to each of the reference scene data (420_1 to 420_N), and stores them in the reference scene database 430.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 영상 데이터(410)로부터 이미지로 디코딩한 후 재생 시간 간격으로 이미지를 샘플링할 수 있다. In one embodiment, the video data skip function providing device 100 may decode the video data 410 into an image and then sample the image at playback time intervals.

상기의 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 샘플링된 이미지 중 서로 인접한 이미지의 유사도에 기초하여 샘플링된 이미지를 장면 단위로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. 여기에서, 인접한 이미지는 샘플링된 이미지를 영상이 재생되는 시간 순서대로 나열하였을 때 이웃하는 이미지를 의미할 수 있다.In the above embodiment, the apparatus 100 for providing the image data skip function may generate reference scene data by grouping the sampled images into scene units based on the similarity of adjacent images among the sampled images. Here, adjacent images may mean neighboring images when sampled images are arranged in the order of video playback time.

또한, 영상 데이터 스킵 기능 제공 장치(100)는 참조 장면 데이터(420_1~420_N)를 분석하여 참조 장면 데이터 각각에 태그를 할당한다. Additionally, the video data skip function providing device 100 analyzes the reference scene data (420_1 to 420_N) and assigns a tag to each of the reference scene data.

이를 위해, 영상 데이터 스킵 기능 제공 장치(100)는 참조 장면 데이터(420_1~420_N)의 특징을 추출하여 참조 장면 데이터(420_1~420_N)의 특징 정보를 추출하고 특정 정보에 따라 서로 다른 종류의 태그를 할당한다. To this end, the video data skip function providing device 100 extracts features of the reference scene data (420_1 to 420_N), extracts feature information of the reference scene data (420_1 to 420_N), and creates different types of tags according to specific information. Allocate.

일 실시예에서, 영상 데이터 스킵 기능 제공 장치(100)는 참조 장면 데이터(420_1~420_N)에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당한다. In one embodiment, the image data skip function providing device 100 extracts feature information of the object included in the reference scene data 420_1 to 420_N, and expresses the feature information of the object as a vector value to generate feature information of the object. And, an object attribute tag is assigned according to the characteristic information of the object.

예를 들어, 영상 데이터 스킵 기능 제공 장치(100)는 도 6(a)의 참조 장면 데이터(420_3))에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 도 6(b)와 같이 생성하고, 오브젝트의 특징 정보에 따라 도 6(a)와 같이 오브젝트 속성 태그를 할당할 수 있다. For example, the image data skip function providing device 100 extracts feature information of the object included in the reference scene data 420_3 of FIG. 6(a) and expresses the feature information of the object as a vector value to represent the object. Characteristic information can be generated as shown in FIG. 6(b), and object attribute tags can be assigned according to the characteristic information of the object as shown in FIG. 6(a).

보다 구체적으로, 영상 데이터 스킵 기능 제공 장치(100)는 도 6(b)와 같이 오브젝트의 특징 영역을 탐지(Interest Point Detection)할 수 있다. 여기에서, 특징 영역이란, 오브젝트들 사이의 동일 유사 여부를 판단하기 위한 오브젝트의 특징에 대한 기술자, 즉 특징 기술자(Feature Descriptor)를 추출하는 주요 영역을 말한다. More specifically, the device 100 for providing the image data skip function can detect the feature area of the object (Interest Point Detection) as shown in FIG. 6(b). Here, the feature area refers to the main area where a feature descriptor, that is, a descriptor for the characteristics of an object for determining whether or not objects are identical or similar, is extracted.

한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.Although the present invention has been described with reference to limited embodiments and drawings, the present invention is not limited to the above embodiments, and various modifications and variations can be made by those skilled in the art from these descriptions. Accordingly, the spirit of the present invention should be understood only by the scope of the claims set forth below, and all equivalent or equivalent modifications thereof shall fall within the scope of the spirit of the present invention.

100: 영상 데이터 스킵 기능 제공 장치,
110: 영상 데이터제공부,
120: 하이라이트 파트 생성부,
130: 장면 데이터베이스
140: 영상 데이터 스킵 기능 제공부
200: 동영상 자동 생성 장치,
300: 참조 영상 데이터 추천 장치
400_1~400_N: 고객 단말,
500_1~500_N: 사용자 단말100: Device for providing video data skip function,
110: Video data provision department,
120: Highlight part creation unit,
130: Scene database
140: Video data skip function providing unit
200: Automatic video creation device,
300: Reference image data recommendation device
400_1~400_N: Customer terminal,
500_1~500_N: User terminal

Claims

영상 데이터 스킵 기능 제공 장치에서 실행되는 영상 데이터 스킵 기능 제공 방법에 있어서,
영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 파트로 점프 요청되거나 특정 하이라이트 검색 요청 메시지를 수신하는 단계;
미리 생성된 참조 영상 데이터를 썸네일로 하는 복수의 하이라이트 파트를 제공하거나 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트를 제공하는 단계; 및
상기 복수의 하이라이트 파트 중 특정 하이라이트 파트의 재생 또는 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트의 재생이 요청되면, 특정 하이라이트 파트에 해당하는 시점부터 동영상 데이터가 재생되도록 하는 단계를 포함하고,
상기 복수의 하이라이트 파트를 제공하거나 특정 하이라이트 검색 요청 메시지에 해당하는 특정 하이라이트 파트를 제공하는 단계는
영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 검색 요청 메시지를 수신하면 특정 하이라이트 검색 요청 메시지에서 검색 텍스트를 추출하는 단계;
상기 검색 텍스트를 공백을 기준으로 단어를 추출하고, 미리 생성된 단어 별 빈도 수 데이터베이스를 기초로 단어의 빈도 수를 측정하고, 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성하는 단계;
상기 토큰 각각에 대해서 해당 토큰의 단어를 구현하는 언어의 종류, 단어가 스크립트의 텍스트에서 존재하는 위치 및 토큰에 할당된 레이블의 빈도 수에 따라 서로 다른 가중치를 부여하는 단계;
참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 오브젝트 속성 태그, 상황 속성 태그 및 하이라이트 속성 태그 중 적어도 하나의 태그를 비교하여 유사 점수를 산출하고, 상기 유사 점수가 특정 점수 이상인 태그가 할당된 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공하는 단계를 포함하고,
상기 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하는 단계;
장면 데이터의 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 상기 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당하는 단계; 및
참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출한 후 상기 상황의 종류에 따라 상황 속성 태그를 할당하고, 영상 데이터에서 하이라이트 파트를 추출하고, 하이라이트 파트에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당하여 장면 데이터베이스에 저장하는 단계를 포함하는 것을 특징으로 하는
영상 데이터 스킵 기능 제공 방법.
In a method of providing a video data skip function executed in a video data skip function providing device,
In the process of playing video data, a user requests to jump to a specific highlight part or receives a specific highlight search request message;
Providing a plurality of highlight parts using pre-generated reference image data as thumbnails or providing a specific highlight part corresponding to a specific highlight search request message; and
When playback of a specific highlight part among the plurality of highlight parts or playback of a specific highlight part corresponding to a specific highlight search request message is requested, allowing video data to be played from a point corresponding to the specific highlight part,
The step of providing the plurality of highlight parts or providing a specific highlight part corresponding to a specific highlight search request message is
When receiving a specific highlight search request message from a user in the process of playing video data, extracting search text from the specific highlight search request message;
Words are extracted from the search text based on spaces, the frequency of words is measured based on a pre-generated word frequency database, and morpheme analysis is performed on each word, so that words and morpheme values are paired. , generating tokens assigned a label indicating a frequency count;
assigning different weights to each of the tokens according to the type of language that implements the word of the token, the position in which the word exists in the text of the script, and the frequency of the label assigned to the token;
A similarity score is calculated by comparing at least one tag of an object attribute tag, a situation attribute tag, and a highlight attribute tag that matches the morpheme value of the token among a plurality of tags in the reference scene data, and the tag whose similarity score is higher than a certain score is Providing a highlight part with a thumbnail of the assigned reference scene data,
generating reference scene data by dividing the image data into scene units;
Extracting feature information of an object included in reference scene data of the scene data, expressing the feature information of the object as a vector value to generate feature information of the object, and assigning an object attribute tag according to the feature information of the object; and
Reference scene data is trained on a scene type analysis model to extract the type of situation expressed in the scene, then assign situation attribute tags according to the type of situation, extract highlight parts from the video data, and reference scenes corresponding to the highlight parts. Assigning a highlight attribute tag to data and storing it in a scene database.
How to provide video data skip function.

삭제delete

영상 데이터 스킵 기능 제공 장치에 있어서,
영상 데이터를 순차적으로 또는 하이라이트 파트에 해당하는 시점부터 재생하는 영상 데이터 제공부;상기 영상 데이터가 분할되어 생성되며 서로 다른 종류의 태그가 할당된 참조 장면 데이터가 저장되어 있는 장면 데이터베이스;
상기 영상 데이터를 장면 단위로 분할하여 장면 데이터를 생성하고 상기 장면 데이터의 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 상기 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당하고, 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출한 후 상기 상황의 종류에 따라 상황 속성 태그를 할당하고, 영상 데이터에서 하이라이트 파트를 추출하고, 하이라이트 파트에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당하여 상기 장면 데이터베이스에 저장하는 하이라이트 파트 생성부; 및
상기 영상 데이터를 장면 단위로 분할하여 장면 데이터를 생성하고 상기 장면 데이터의 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 상기 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당하고, 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출한 후 상기 상황의 종류에 따라 상황 속성 태그를 할당하고, 영상 데이터에서 하이라이트 파트를 추출하고, 하이라이트 파트에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당하여 상기 장면 데이터베이스에 저장하는 하이라이트 파트 생성부; 및
상기 영상 데이터를 재생하는 과정에서 사용자에 의해 특정 하이라이트 검색 요청 메시지를 수신하면 특정 하이라이트 검색 요청 메시지에서 검색 텍스트를 추출하고, 상기 검색 텍스트를 공백을 기준으로 단어를 추출하고, 미리 생성된 단어 별 빈도 수 데이터베이스를 기초로 단어의 빈도 수를 측정하고, 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성하고, 상기 토큰 각각에 대해서 해당 토큰의 단어를 구현하는 언어의 종류, 단어가 스크립트의 텍스트에서 존재하는 위치 및 토큰에 할당된 레이블의 빈도 수에 따라 서로 다른 가중치를 부여하고, 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 오브젝트 속성 태그, 상황 속성 태그 및 하이라이트 속성 태그 중 적어도 하나의 태그를 비교하여 유사 점수를 산출하고, 상기 유사 점수가 특정 점수 이상인 태그가 할당된 참조 장면 데이터를 썸네일로 하는 하이라이트 파트를 제공하는 영상 데이터 스킵 기능 제공부를 포함하는 것을 특징으로 하는
영상 데이터 스킵 기능 제공 장치.In a device for providing a video data skip function,
An image data provider that reproduces image data sequentially or from a point corresponding to a highlight part; A scene database in which reference scene data is generated by dividing the image data and stored with different types of tags assigned;
Generate scene data by dividing the image data into scene units, extract feature information of the object included in reference scene data of the scene data, and generate feature information of the object by expressing the feature information of the object as a vector value, An object attribute tag is assigned according to the characteristic information of the object, the reference scene data is trained in a scene type analysis model to extract the type of situation expressed in the scene, and then a situation attribute tag is assigned according to the type of situation, and the video data a highlight part generator for extracting a highlight part from a highlight part, assigning a highlight attribute tag to reference scene data corresponding to the highlight part, and storing the highlight part in the scene database; and
Generate scene data by dividing the image data into scene units, extract feature information of the object included in reference scene data of the scene data, and generate feature information of the object by expressing the feature information of the object as a vector value, An object attribute tag is assigned according to the characteristic information of the object, the reference scene data is trained in a scene type analysis model to extract the type of situation expressed in the scene, and then a situation attribute tag is assigned according to the type of situation, and the video data a highlight part generator for extracting a highlight part from a highlight part, assigning a highlight attribute tag to reference scene data corresponding to the highlight part, and storing the highlight part in the scene database; and
In the process of playing the video data, when a specific highlight search request message is received by the user, the search text is extracted from the specific highlight search request message, words are extracted based on the space in the search text, and the frequency of each word is pre-generated. Measure the frequency of words based on the number database, perform morphological analysis on each word, generate tokens in which word and morpheme values are paired, and are assigned a label indicating the frequency, and each of the tokens Different weights are assigned depending on the type of language that implements the word of the corresponding token, the position where the word exists in the text of the script, and the frequency of the label assigned to the token, and the number of the token among the plurality of tags in the reference scene data is given. A highlight part that calculates a similarity score by comparing at least one tag among object attribute tags, situation attribute tags, and highlight attribute tags that match morpheme values, and uses reference scene data as a thumbnail to which a tag with a similarity score higher than a certain score is assigned. Characterized by including a video data skip function providing unit that provides
A device that provides video data skip function.

삭제delete