KR102560610B1

KR102560610B1 - Reference video data recommend method for video creation and apparatus performing thereof

Info

Publication number: KR102560610B1
Application number: KR1020220140180A
Authority: KR
Inventors: 권석면; 김유석
Original assignee: 주식회사 일만백만
Priority date: 2022-10-27
Filing date: 2022-10-27
Publication date: 2023-07-27
Also published as: WO2024091084A1

Abstract

본 발명의 일 실시예에 따른 참조 영상 데이터 추천 장치에서 실행되는 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법은 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하는 단계, 상기 참조 장면 데이터를 분석하여 상기 참조 장면 데이터를 학습시키거나 특징 정보를 추출한 후 이를 기초로 상기 참조 장면 데이터 각각에 서로 다른 종류의 태그를 할당하는 단계 및 상기 태그가 할당된 참조 영상 데이터를 참조 영상 데이터베이스에 저장하는 단계를 포함한다.According to an embodiment of the present invention, a method for recommending reference video data for automatic video generation executed in a reference video data recommending apparatus includes generating reference scene data by dividing the video data into scene units after collecting video data, analyzing the reference scene data, learning the reference scene data or extracting feature information, assigning different types of tags to each of the reference scene data based on this, and storing the tagged reference video data in a reference video database.

Description

동영상 자동 생성을 위한 참조 영상 데이터 추천 방법 및 이를 실행하는 장치{REFERENCE VIDEO DATA RECOMMEND METHOD FOR VIDEO CREATION AND APPARATUS PERFORMING THEREOF}Method for recommending reference video data for automatic video generation and apparatus for executing the same

본 발명은 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법 및 이를 실행하는 장치에 관한 것으로, 보다 구체적으로 영상 데이터를 장면 단위로 분할하여 참조 영상 데이터를 생성한 후 태그를 할당함으로써 추후에 동영상 자동 생성 장치로부터 키워드가 수신되면 키워드에 해당하는 태그가 할당된 참조 영상 데이터를 추천하는 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법 및 이를 실행하는 장치에 관한 것이다. The present invention relates to a method for recommending reference video data for automatically generating a video and an apparatus for executing the same, and more particularly, to a method for recommending reference video data for automatically generating a video that divides video data into scenes to generate reference video data and then assigns a tag to recommend reference video data to which a tag corresponding to the keyword is assigned when a keyword is later received from the apparatus for automatically generating a video, and an apparatus for executing the same.

광고주가 인터넷에서 광고를 하려면 동영상 광고, 배너 광고, 플래시 광고 등을 별도로 제작해야만 한다. 광고를 제작하는 데에는 시간과 비용이 많이 든다.In order for advertisers to advertise on the Internet, they must separately produce video advertisements, banner advertisements, and flash advertisements. Advertising is time-consuming and expensive.

다량의 상품을 보유한 광고주(예 : 대기업, TV 홈쇼핑, 온라인 쇼핑몰 등)는 일부 대표 상품에 한정하여 광고를 제작한다. 또는 각각의 상품 광고를 제작하는데 비용이 많이 들기 때문에 특정 상품에 국한되지 않는 회원 가입 광고, 브랜드 홍보 광고, 할인 광고 등을 제작한다.Advertisers (eg, large corporations, TV home shopping, online shopping malls, etc.) that have a large number of products produce advertisements limited to some representative products. Or, since it costs a lot to produce advertisements for each product, membership subscription advertisements, brand promotion advertisements, discount advertisements, and the like that are not limited to specific products are produced.

특히, 온라인 광고의 경우 시청 대상이 되는 인터넷 사용자들이 다양하고, 각종 프로모션의 조건이 시시각각으로 변하기 때문에 별도로 개별 상품의 온라인 광고를 제작하는데 어려움이 있다. In particular, in the case of online advertisements, it is difficult to separately produce online advertisements for individual products because the target audience of Internet users is diverse and conditions for various promotions change from moment to moment.

예를 들면, 오늘의 신상품, 오늘 마감 상품, 타임 특가 상품 등의 경우 프로모션의 조건이 시간의 제약을 받고 있어서 해당 상품의 광고를 제작하는데 어려움이 있다.For example, in the case of today's new product, today's deadline product, time special product, etc., promotion conditions are limited by time, making it difficult to produce advertisements for the corresponding product.

본 발명은 영상 데이터를 장면 단위로 분할하여 참조 영상 데이터를 생성한 후 태그를 할당함으로써 추후에 동영상 자동 생성 장치로부터 키워드가 수신되면 키워드에 해당하는 태그가 할당된 참조 영상 데이터를 추천하는 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법 및 이를 실행하는 장치를 제공하는 것을 목적으로 한다. An object of the present invention is to provide a method for recommending reference video data for automatically generating a video, which divides video data into scenes, generates reference video data, and then assigns a tag to recommend reference video data to which a tag corresponding to the keyword is assigned when a keyword is later received from an apparatus for automatically generating a video, and an apparatus for executing the same.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention not mentioned above can be understood by the following description and will be more clearly understood by the examples of the present invention. It will also be readily apparent that the objects and advantages of the present invention may be realized by means of the instrumentalities and combinations indicated in the claims.

이러한 목적을 달성하기 위한 참조 영상 데이터 추천 장치에서 실행되는 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법은 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하는 단계, 상기 참조 장면 데이터를 분석하여 상기 참조 장면 데이터를 학습시키거나 특징 정보를 추출한 후 이를 기초로 상기 참조 장면 데이터 각각에 서로 다른 종류의 태그를 할당하는 단계 및 상기 태그가 할당된 참조 영상 데이터를 참조 영상 데이터베이스에 저장하는 단계를 포함한다.To achieve this goal, a method for recommending reference video data for automatic video generation executed by a reference video data recommending device includes generating reference scene data by dividing the video data into scene units after collecting video data, analyzing the reference scene data, learning the reference scene data or extracting feature information, assigning different types of tags to each of the reference scene data based on this, and storing the tagged reference video data in a reference video database.

또한 이러한 목적을 달성하기 위한 참조 영상 데이터 제공 장치는 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하는 참조 장면 데이터 추출부, 상기 참조 장면 데이터를 분석하여 상기 참조 장면 데이터를 학습시키거나 특징 정보를 추출한 후 이를 기초로 상기 참조 장면 데이터 각각에 서로 다른 종류의 태그를 할당하는 태그 할당부 및 상기 태그가 할당된 참조 영상 데이터를 참조 영상 데이터베이스에 저장하는 참조 영상 데이터베이스 구축부를 포함한다.In addition, an apparatus for providing reference image data to achieve this object includes a reference scene data extractor that collects image data and then divides the image data into scene units to generate reference scene data, analyzes the reference scene data to learn the reference scene data or extracts feature information, and based on this, a tag assignment unit that assigns different types of tags to each of the reference scene data, and a reference image database builder that stores the reference image data to which the tags are assigned in a reference image database.

전술한 바와 같은 본 발명에 의하면, 영상 데이터를 장면 단위로 분할하여 참조 영상 데이터를 생성한 후 태그를 할당함으로써 추후에 동영상 자동 생성 장치로부터 키워드가 수신되면 키워드에 해당하는 태그가 할당된 참조 영상 데이터를 추천함으로써 동영상을 자동으로 생성할 수 있다는 장점이 있다. According to the present invention as described above, there is an advantage in that a video can be automatically generated by recommending reference video data to which a tag corresponding to the keyword is assigned by assigning a tag after dividing video data into scene units to generate reference video data and then assigning a tag.

도 1은 본 발명의 일 실시예에 따른 동영상 자동 생성 시스템을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 동영상 자동 생성 장치의 내부 구조를 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 참조 영상 데이터 추천 장치의 내부 구조를 설명하기 위한 도면이다.
도 4 내지 7은 본 발명의 일 실시예에 따른 동영상 자동 생성 장치를 설명하기 위한 도면이다.
도 8은 본 발명에 따른 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법의 일 실시예를 설명하기 위한 흐름도이다.1 is a diagram for explaining an automatic video creation system according to an embodiment of the present invention.
2 is a diagram for explaining the internal structure of an apparatus for automatically generating a video according to an embodiment of the present invention.
3 is a diagram for explaining the internal structure of an apparatus for recommending reference video data according to an embodiment of the present invention.
4 to 7 are diagrams for explaining an apparatus for automatically generating a video according to an embodiment of the present invention.
8 is a flowchart illustrating an embodiment of a method for recommending reference video data for automatically generating a video according to the present invention.

전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다. 도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용된다.The above objects, features and advantages will be described later in detail with reference to the accompanying drawings, and accordingly, those skilled in the art to which the present invention belongs will be able to easily implement the technical spirit of the present invention. In describing the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to indicate the same or similar components.

도 1은 본 발명의 일 실시예에 따른 동영상 자동 생성 시스템을 설명하기 위한 도면이다.1 is a diagram for explaining an automatic video creation system according to an embodiment of the present invention.

도 1을 참조하면, 동영상 자동 생성 시스템은 동영상 자동 생성 장치(200), 참조 영상 데이터 추천 장치(300), 고객 단말(400_1~400_N) 및 사용자 단말(500_1~500_N)을 포함한다. Referring to FIG. 1 , the automatic video generation system includes an automatic video generation apparatus 200, a reference video data recommendation apparatus 300, customer terminals 400_1 to 400_N, and user terminals 500_1 to 500_N.

동영상 자동 생성 장치(200)는 고객의 요청에 따라 영상을 자동으로 생성한다. 이러한 영상은 광고 영상 등을 포함할 수 있다. The automatic video creation device 200 automatically creates a video according to a customer's request. These images may include advertisement images and the like.

먼저, 동영상 자동 생성 장치(200)는 고객 단말(400_1~400_N)로부터 수신된 영상 생성 참조 정보를 이용하여 스크립트를 생성한다. First, the automatic video generation device 200 generates a script using video generation reference information received from customer terminals 400_1 to 400_N.

일 실시예에서, 동영상 자동 생성 장치(200)는 고객 단말(400_1~400_N)로부터 수신된 영상 생성 참조 정보가 단어 단위의 키워드인 경우 미리 생성된 스크립트 데이터베이스에서 키워드에 해당하는 오브젝트 속성, 오브젝트와 매칭되는 장면의 화면 속성 및 오브젝트와 매칭되는 장면의 상황 속성을 이용하여 스크립트를 생성할 수 있다. In one embodiment, when the image generation reference information received from the customer terminals 400_1 to 400_N is a keyword in word units, the automatic video generation apparatus 200 may generate a script using an object property corresponding to a keyword in a pre-generated script database, a screen property of a scene matched with an object, and a situation property of a scene matched with an object.

상기의 실시예에서, 동영상 자동 생성 장치(200)는 키워드에 해당하는 오브젝트 속성, 오브젝트와 매칭되는 장면의 화면 속성 및 오브젝트와 매칭되는 장면의 상황 속성 중 고객과 관련된 컨텐츠를 이용한 사용자의 행동 정보를 기초로 결정된 속성과 매칭하는 텍스트를 이용하여 스크립트를 생성할 수 있다. In the above embodiment, the automatic video generation device 200 may generate a script using text that matches a property determined based on behavioral information of a user using content related to a customer among object properties corresponding to keywords, screen properties of a scene matching an object, and situation properties of a scene matching an object.

그 후, 동영상 자동 생성 장치(200)는 상기 스크립트를 기초로 기준 장면 데이터로 구성된 시나리오를 생성한 후 상기 스크립트에서 키워드를 추출한다.Thereafter, the automatic video generation device 200 generates a scenario composed of reference scene data based on the script and then extracts keywords from the script.

보다 구체적으로, 동영상 자동 생성 장치(200)는 기준 장면 데이터의 스크립트의 텍스트를 공백을 기준으로 단어를 추출하고, 미리 생성된 단어 별 빈도 수 데이터베이스를 기초로 단어의 빈도 수를 측정한다. More specifically, the automatic video generation apparatus 200 extracts words based on blanks in the text of the script of the reference scene data, and measures the frequency count of words based on a pre-generated frequency count database for each word.

그런 다음, 동영상 자동 생성 장치(200)는 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성한다. Then, the automatic video generation apparatus 200 performs morpheme analysis on each word to generate a token in which a word and a morpheme value are paired and a label indicating a frequency is assigned.

예를 들어, 동영상 자동 생성 장치(200)는 스크립트의 텍스트를 분석하여 (빈도 수: 1000번, (단어, 형태소 값)), (빈도 수: 234번, (단어, 형태소)), (빈도수: 2541번, (단어, 형태소)), (빈도수: 2516번, (단어, 형태소)) … 등의 토큰을 생성할 수 있다. For example, the automatic video generation device 200 analyzes the text of the script (frequency: 1000 times, (word, morpheme value)), (frequency: 234 times, (word, morpheme)), (frequency: 2541 times, (word, morpheme)), (frequency: 2516 times, (word, morpheme))... Tokens can be generated.

상기와 같이 동영상 자동 생성 장치(200)는 토큰을 생성한 후 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여한다. As described above, after generating tokens, the automatic video generation device 200 assigns different weights to each token according to the word of the token and the label of the token.

일 실시예에서, 동영상 자동 생성 장치(200)는 토큰 각각에 대해서 해당 토큰의 단어를 구현하는 언어의 종류(즉, 영어, 중국어, 한국어 등), 단어가 스크립트의 텍스트에서 존재하는 위치 및 토큰에 할당된 레이블의 빈도 수에 따라 서로 다른 가중치를 부여한다. In one embodiment, the automatic video generation device 200 assigns different weights to each token according to the type of language that implements the word of the corresponding token (i.e., English, Chinese, Korean, etc.), the position where the word exists in the text of the script, and the frequency of the label assigned to the token.

먼저, 동영상 자동 생성 장치(200)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수 및 각각의 토큰의 순서를 이용하여 제1 가중치를 산출한다. First, the automatic video generation device 200 calculates a first weight by using the total number of tokens generated from the text of the script and the order of each token.

일 실시예에서, 동영상 자동 생성 장치(200)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수를 기준으로 토큰의 순서가 어느 정도인지 여부 및 언어의 종류에 따라 미리 결정된 중요 값에 제1 가중치를 산출할 수 있다. In one embodiment, the automatic video generation device 200 may calculate a first weight for a predetermined important value based on the order of the tokens and the type of language based on the total number of tokens generated from the text of the script.

예를 들어, 동영상 자동 생성 장치(200)는 전체 토큰의 개수가 12개 이고 토큰의 순서가 4번째인 경우, “0.25”를 산출하고, 언어의 종류에 따라 미리 결정된 중요 값을 반영하여 제1 가중치를 산출할 수 있다. For example, when the total number of tokens is 12 and the order of the tokens is 4, the automatic video generation device 200 calculates “0.25” and reflects a predetermined important value according to the type of language to calculate the first weight.

이때, 언어의 종류에 따라 미리 결정된 중요 값은 언어의 종류 별로 중요한 단어가 어느 위치에 나타내는지 여부에 따라 변경될 수 있다. 즉, 언어의 종류에 따라 미리 결정된 중요 값은 현재 토큰의 번호에 따라 변경될 수 있다At this time, the importance value predetermined according to the type of language may be changed according to a location where an important word for each type of language appears. That is, the important value predetermined according to the type of language may be changed according to the number of the current token.

다른 일 실시예에서, 동영상 자동 생성 장치(200)는 스크립트의 텍스트에서 생성된 토큰 각각에 대해서 토큰에 미리 할당된 레이블이 지시하는 빈도 수와 이전 토큰 및 다음 토큰 각각에 미리 할당된 레이블이 지시하는 빈도 수를 이용하여 제2 가중치를 산출할 수 있다. In another embodiment, the automatic video generation device 200 may calculate the second weight by using the number of frequencies indicated by the label pre-assigned to the token and the number of frequencies indicated by the label pre-assigned to each of the previous token and the next token for each token generated from the text of the script.

그 후, 동영상 자동 생성 장치(200)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다. 상기와 같이, 동영상 자동 생성 장치(200)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다.After that, the automatic video generation apparatus 200 assigns a final weight using the first weight and the second weight. As described above, the automatic video generation device 200 assigns a final weight using the first weight and the second weight.

동영상 자동 생성 장치(200)는 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 참조 영상 데이터 추천 장치(300)에 제공하고, 참조 영상 데이터 추천 장치(300)로부터 참조 영상 데이터를 수신한다. The automatic video generation apparatus 200 provides a reference video data recommendation request message including keywords composed of tokens to which different weights are assigned to the reference video data recommendation apparatus 300, and receives the reference video data from the reference video data recommendation apparatus 300.

그 후, 동영상 자동 생성 장치(200)는 추출된 참조 장면 데이터 및 미리생성된 환경 데이터를 합성하여 영상 데이터를 생성한다. Thereafter, the automatic video generation apparatus 200 generates video data by synthesizing the extracted reference scene data and pre-generated environment data.

이를 위해, 동영상 자동 생성 장치(200)는 시나리오에 따라 음향 데이터를 선택하고, 상기 시나리오에 해당하는 텍스트 데이터를 음성 데이터로 변환하고, 상기 시나리오에 따라 AI 배우를 생성할 수 있다. To this end, the automatic video generation device 200 may select sound data according to a scenario, convert text data corresponding to the scenario into voice data, and generate an AI actor according to the scenario.

참조 영상 데이터 추천 장치(300)는 고객의 요청에 따라 동영상을 자동으로 생성하기 위해서 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하고, 참조 장면 데이터 각각에 태그를 할당한 후 참조 장면 데이터베이스에 저장한다. The reference video data recommendation device 300 collects video data to automatically generate a video according to a customer's request, divides the video data into scene units to generate reference scene data, assigns a tag to each reference scene data, and stores it in a reference scene database.

먼저, 참조 영상 데이터 추천 장치(300)는 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성한다. First, the apparatus 300 for recommending reference video data collects video data and divides the video data into scene units to generate reference scene data.

일 실시예에서, 참조 영상 데이터 추천 장치(300)는 영상 데이터로부터 이미지로 디코딩한 후 재생 시간 간격으로 이미지를 샘플링할 수 있다. In an embodiment, the apparatus 300 for recommending reference video data may sample an image at intervals of play time after decoding video data into an image.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지 중 서로 인접한 이미지의 유사도에 기초하여 샘플링된 이미지를 장면 단위로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. 여기에서, 인접한 이미지는 샘플링된 이미지를 영상이 재생되는 시간 순서대로 나열하였을 때 이웃하는 이미지를 의미할 수 있다.In the above embodiment, the reference image data recommendation apparatus 300 may generate reference scene data by grouping the sampled images in a scene unit based on similarities between adjacent images among the sampled images. Here, the adjacent image may refer to a neighboring image when the sampled images are arranged in the order in which the images are reproduced.

예를 들어, 참조 영상 데이터 추천 장치(300)는 인접한 이미지에 대하여 피쳐 매칭(Feature Matching)을 수행하여 이미지의 유사도를 연산할 수 있다. 가령, 동영상 자동 생성 장치(200)는 인접한 이미지의 특징점을 대조하여 소정 정도 이상의 유사도를 보이는 이미지를 하나의 장면으로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. For example, the reference image data recommendation apparatus 300 may calculate a similarity between images by performing feature matching on adjacent images. For example, the automatic video generation apparatus 200 may generate reference scene data by comparing feature points of adjacent images and grouping images having a similarity of a predetermined level or more into one scene.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지 각각에서 추출되는 오브젝트의 개수 변화를 산출하고, 오브젝트의 개수 변화에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference image data recommendation apparatus 300 calculates the change in the number of objects extracted from each sampled image, determines that the scene has changed according to the change in the number of objects, and generates reference scene data based on the corresponding time point.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지에픽대해서 동일한 픽셀의 픽셀 값 변화를 이용하여 배경 이미지가 변화하였는지 여부를 판단하고, 판단 결과에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference video data recommendation apparatus 300 determines whether the background image has changed using the pixel value change of the same pixel for the sampled image, determines that the scene has changed according to the determination result, and generates reference scene data based on the corresponding time point.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 영상 데이터를 구성하는 음성 데이터 및 자막 데이터를 기초로 새로운 내용이 표시되는 시점을 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference video data recommendation apparatus 300 may generate reference scene data by determining that a time when new content is displayed is a new scene based on audio data and subtitle data constituting the video data.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 샘플링된 이미지 각각에서 추출되는 오브젝트를 추출하고, 오브젝트가 사라졌거나 새로운 오브젝트가 나타나면 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the reference image data recommendation apparatus 300 may extract an object extracted from each sampled image, determine that the object is a new scene when the object disappears or a new object appears, and generate reference scene data.

또한, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터를 분석하여 참조 장면 데이터 각각에 태그를 할당한다. Also, the reference image data recommendation apparatus 300 analyzes the reference scene data and assigns a tag to each reference scene data.

이를 위해, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 특징을 추출하여 참조 장면 데이터의 특징 정보를 추출하고 특정 정보에 따라 서로 다른 종류의 태그를 할당한다. To this end, the reference image data recommendation apparatus 300 extracts features of the reference scene data, extracts feature information of the reference scene data, and assigns different types of tags according to the specific information.

일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당한다. In an embodiment, the reference image data recommendation apparatus 300 extracts object feature information included in reference scene data, generates object feature information by expressing the object feature information as a vector value, and allocates an object attribute tag according to the object feature information.

보다 구체적으로, 참조 영상 데이터 추천 장치(300)는 오브젝트의 특징 영역을 탐지(Interest Point Detection)할 수 있다. 여기에서, 특징 영역이란, 오브젝트들 사이의 동일 유사 여부를 판단하기 위한 오브젝트의 특징에 대한 기술자, 즉 특징 기술자(Feature Descriptor)를 추출하는 주요 영역을 말한다. More specifically, the reference image data recommendation apparatus 300 may detect a feature region of an object (Interest Point Detection). Here, the feature region refers to a main region in which a descriptor for a feature of an object for determining whether objects are identical or similar, that is, a feature descriptor is extracted.

본 발명의 실시예에 따르면 이러한 특징 영역은 오브젝트가 포함하고 있는 윤곽선, 윤곽선 중에서도 코너 등의 모퉁이, 주변 영역과 구분되는 블롭(blob), 참조 장면 데이터의 변형에 따라 불변하거나 공변하는 영역, 또는 주변 밝기보다 어둡거나 밝은 특징이 있는 극점일 수 있으며 참조 장면 데이터의 패치(조각) 또는 참조 장면 데이터 전체를 대상으로 할 수 있다. According to an embodiment of the present invention, such a feature region may be a contour included in an object, a corner such as a corner among contour lines, a blob distinguished from the surrounding region, a region that is invariant or covariant according to deformation of reference scene data, or a pole having a feature that is darker or brighter than the surrounding brightness, and may target a patch (piece) of reference scene data or the entire reference scene data.

다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 특징 영역에서 특징 기술자를 추출(Descriptor Extraction)하고, 특징 기술자에 따라 참조 장면 데이터에서 화면 속성 태그를 할당할 수 있다. 특징 기술자는 참조 장면 데이터의 특징들을 벡터 값으로 표현한 것이다. In another embodiment, the reference image data recommendation apparatus 300 extracts a feature descriptor from a feature region of the reference scene data (Descriptor Extraction), and allocates a screen property tag from the reference scene data according to the feature descriptor. A feature descriptor expresses features of reference scene data as vector values.

상기의 이러한 특징 기술자는 참조 장면 데이터에 대한 특징 영역의 위치, 또는 특징 영역의 밝기, 색상, 선명도, 그라디언트, 스케일 또는 패턴 정보를 이용하여 계산할 수 있다. 예를 들어 특징 기술자는 특징 영역의 밝기 값, 밝기의 변화 값 또는 분포 값 등을 벡터로 변환하여 계산할 수도 있다. The above feature descriptor may be calculated using the position of the feature region with respect to the reference scene data, or brightness, color, sharpness, gradient, scale, or pattern information of the feature region. For example, a feature descriptor may convert a brightness value, a change value of brightness, or a distribution value of a feature region into a vector for calculation.

한편, 본 발명의 실시예에 따르면 참조 장면 데이터에 대한 특징 기술자는 위와 같이 특징 영역에 기반한 지역 기술자(Local Descriptor) 뿐 아니라, 전역 기술자(Global descriptor), 빈도 기술자(Frequency Descriptor), 바이너리 기술자(Binary Descriptor) 또는 신경망 기술자(Neural Network descriptor)로 표현될 수 있다. On the other hand, according to an embodiment of the present invention, the feature descriptor for the reference scene data is not only a local descriptor based on a feature region as described above, but also a global descriptor, a frequency descriptor, and a binary descriptor. It may be expressed as a descriptor or a neural network descriptor.

보다 구체적으로, 특징 기술자는 참조 장면 데이터의 전체 또는 참조 장면 데이터를 임의의 기준으로 분할한 구역 각각, 또는 특징 영역 각각의 밝기, 색상, 선명도, 그라디언트, 스케일, 패턴 정보 등을 벡터값으로 변환하여 추출하는 전역 기술자(Global descriptor)를 포함할 수 있다. More specifically, the feature descriptor converts the brightness, color, sharpness, gradient, scale, pattern information, etc. of the entirety of the reference scene data, each region divided into reference scene data based on an arbitrary criterion, or each feature region into vector values. It may include a global descriptor that is extracted.

예를 들어, 특징 기술자는 미리 구분한 특정 기술자들이 참조 장면 데이터에 포함되는 횟수, 종래 정의된 색상표와 같은 전역적 특징의 포함 횟수 등을 벡터값으로 변환하여 추출하는 빈도 기술자 (Frequency Descriptor), 각 기술자들의 포함 여부 또는 기술자를 구성하는 각 요소 값들의 크기가 특정값 보다 크거나 작은지 여부를 비트 단위로 추출한 뒤 이를 정수형으로 변환하여 사용하는 바이너리 기술자 (Binary descriptor), 신경망(Neural Network)의 레이어에서 학습 또는 분류를 위해 사용되는 영상 정보를 추출하는 신경망 기술자(Neural Network descriptor)를 포함할 수 있다. For example, a feature descriptor is a frequency descriptor that converts the number of pre-specified descriptors included in reference scene data, the number of global features, such as a conventionally defined color table, into vector values and extracts them, a binary descriptor that extracts whether each descriptor is included or whether the size of each element value constituting the descriptor is larger or smaller than a specific value, and then converts it into an integer type and uses it. A neural network descriptor for extracting image information may be included.

또 다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출하고, 상황의 종류에 따라 상황 속성 태그를 할당한다. 이때, 장면 종류는 각 장면에서 표현되는 상황의 종류를 의미한다. In another embodiment, the reference image data recommendation apparatus 300 trains the reference scene data into a scene type analysis model to extract the type of situation represented in the scene and allocates a situation attribute tag according to the type of situation. At this time, the scene type means the type of situation expressed in each scene.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 장면 종류 분석 모델을 CNN 딥 러닝 모델(CNN Deep Learning Model)로 구축하고, 상술한 데이터 셋을 학습할 수 있다. 이때, CNN 딥 러닝 모델은 두 개의 컨볼루션 레이어, 렐루 레이어, 맥스 풀링 레이어 및 하나의 풀리 커넥티드 레이어를 포함하도록 설계될 수 있다. In the above embodiment, the reference image data recommendation apparatus 300 may build a scene type analysis model as a CNN deep learning model and learn the above-described data set. In this case, the CNN deep learning model may be designed to include two convolutional layers, a relu layer, a max pooling layer, and one fully connected layer.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 RCNN 기법을 활용하여 CNN에서 산출된 컨볼루션 피쳐 맵(Convolution Feature Maps)의 맵 순서대로 피쳐 시퀀스(Feature Sequence)를 구성한 후, 각 피쳐 시퀀스를 롱 숏 텀 메모리 네트워크(LSTM; Long Short Term Memory networks)에 대입하여 학습할 수 있다.In the above embodiment, the reference image data recommendation apparatus 300 constructs a feature sequence in the map order of the convolution feature maps calculated by the CNN using the RCNN technique, and then substitutes each feature sequence into a long short term memory network (LSTM) for learning.

또 다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 영상 데이터에서 하이라이트 부분을 추출하고, 하이라이트 부분에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당한다. 이때, 하이라이트 부분은 영상 데이터에서 추출된 일부 구간을 의미할 수 있고, 이는 영상 데이터가 직접 지정한 구간이거나 혹은 자동 추출되는 구간일 수 있다. In another embodiment, the reference video data recommendation apparatus 300 extracts a highlight part from video data and allocates a highlight attribute tag to the reference scene data corresponding to the highlight part. In this case, the highlight part may refer to a partial section extracted from the image data, which may be a section directly designated by the image data or an automatically extracted section.

그 후, 참조 영상 데이터 추천 장치(300)는 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하면, 참조 영상 데이터 추천 요청 메시지를 기초로 참조 장면 데이터베이스(330)에서 참조 장면 데이터를 추출하여 동영상 자동 생성 장치(200)에 제공한다. Then, when the reference video data recommendation device 300 receives a reference video data recommendation request message including keywords composed of tokens to which different weights are assigned, reference scene data is extracted from the reference scene database 330 based on the reference video data recommendation request message and provided to the automatic video generation device 200.

먼저, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그를 비교하여 유사 점수를 산출하고, 유사 점수가 특정 점수 이상인 태그가 할당된 참조 장면 데이터를 참조 장면 데이터베이스(330)에서 추출한다.First, the reference video data recommendation apparatus 300 calculates a similarity score by comparing a tag matching the morpheme value of the token among a plurality of tags of reference scene data, and extracts reference scene data to which a tag having a similarity score equal to or higher than a specific score is assigned, from the reference scene database 330.

일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터베이스(330)에서 추출된 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 참조 장면 데이터를 추출하고, 추출된 참조 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In one embodiment, the reference video data recommendation apparatus 300 extracts reference scene data to which a tag matching the morpheme value of the token is assigned among a plurality of tags of the reference scene data extracted from the reference scene database 330, and matches the tag of the extracted reference scene data with the word of the token. If they match, the corresponding reference scene data is extracted and provided.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 토큰의 형태소 값이 명사인 경우 참조 장면 데이터베이스(330)에 추출된 참조 장면 데이터의 복수의 태그 중 오브젝트 속성 태그 및 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In the above embodiment, when the morpheme value of the token is a noun, the reference video data recommendation device 300 matches the object attribute tag and the word of the token among a plurality of tags of the reference scene data extracted from the reference scene database 330. If they match, the corresponding reference scene data is extracted and provided.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 토큰의 형태소 값이 형용사인 경우 참조 장면 데이터베이스(330)에 추출된 참조 장면 데이터의 복수의 태그 중 화면 속성 태그 및 상황 속성 태그와 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In the above embodiment, when the morpheme value of a token is an adjective, the reference video data recommendation device 300 matches a screen property tag and a situation property tag among a plurality of tags of reference scene data extracted from the reference scene database 330, and the words of the token. If they match, the corresponding reference scene data is extracted and provided.

다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터베이스(330)에서 추출된 참조 장면 데이터의 복수의 태그 중 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 참조 장면 데이터에 대해서, 참조 장면 데이터의 복수의 태그 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하고, 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공할 수 있다.In another embodiment, the reference video data recommendation apparatus 300 may calculate a similarity ratio by matching a plurality of tags of the reference scene data with words of the token for reference scene data to which a tag that does not match the morpheme value of a token is assigned among a plurality of tags of the reference scene data extracted from the reference scene database 330, and may extract and provide reference video data to which a tag having a similarity ratio of a specific score or more is assigned.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하고, 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링수를 기준으로 상기 일치하는 문자의 개수의비율에 따라 상기 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공할 수 있다. In the above embodiment, the reference image data recommendation apparatus 300 matches a plurality of tags of reference scene data and characters constituting each word of the token to calculate the number of matched characters, compares the number of strings corresponding to the plurality of tags and the number of strings corresponding to the word of the token, calculates the similarity rate according to the ratio of the number of matched characters based on the longer number of strings, and extracts and provides reference image data to which a tag having a similarity rate equal to or higher than a specific score is assigned.

고객 단말(400_1~400_N)은 웹 서비스 제공 서버에 접속하기 위한 어플리케이션이 설치되어 있다. 따라서, 고객 단말(400_1~400_N)은 어플리케이션이 선택되어 실행되면, 어플리케이션을 통해 해당 동영상 자동 생성 장치(200)에 접속할 수 있다. 고객 단말(400_1~400_N)은 동영상 자동 생성 장치(200)에 영상 생성 참조 정보를 제공하여 영상의 자동 생성을 요청한다. The customer terminals 400_1 to 400_N have applications installed to access the web service providing server. Accordingly, when an application is selected and executed, the customer terminals 400_1 to 400_N may access the corresponding video automatically generating apparatus 200 through the application. The customer terminals 400_1 to 400_N request automatic video generation by providing video generation reference information to the automatic video generation device 200 .

사용자 단말(500_1~500_N)은 웹 서비스 제공 서버(200)에 접속하기 위한 어플리케이션이 설치되어 있다. 따라서, 사용자 단말(500_1~500_N)은 어플리케이션이 선택되어 실행되면, 어플리케이션을 통해 해당 웹 서비스 제공 서버에 접속할 수 있다. The user terminals 500_1 to 500_N have applications for accessing the web service providing server 200 installed. Accordingly, when an application is selected and executed, the user terminals 500_1 to 500_N may access a corresponding web service providing server through the application.

사용자 단말(500_1~500_N)은 어플리케이션을 통해 웹 서비스 제공 서버(200)에서 제공되는 웹 페이지를 표시할 수 있다. 이때, 웹 페이지는 사용자의 스크롤에 따라 화면에 즉시 표시될 수 있도록 전자장치에 로딩된 화면 및/또는 상기 화면 내부의 컨텐츠 등을 포함한다. The user terminals 500_1 to 500_N may display web pages provided from the web service providing server 200 through applications. At this time, the web page includes a screen loaded in the electronic device and/or content inside the screen so that it can be immediately displayed on the screen according to the user's scroll.

예를 들어, 사용자 단말(500_1~500_N)의 어플리케이션 상에서 웹 페이지가 표시된 상태에서 수평 또는 수직 방향으로 길게 연장되어 사용자의 스크롤에 따라 표시되는 어플리케이션의 실행 화면 전체가 상기 웹 페이지의 개념에 포함될 수 있으며, 카메라 롤 중인 화면 역시 상기 웹 페이지의 개념에 포함될 수 있다. For example, in a state where a web page is displayed on an application of the user terminal 500_1 to 500_N, the entire running screen of an application that is extended in a horizontal or vertical direction and displayed according to the user's scrolling may be included in the concept of the web page, and a screen currently being camera rolled may also be included in the concept of the web page.

또한, 사용자 단말(500_1~500_N)에는 사용자 관심사 분석을 위한 어플리케이션(예를 들어, 소프트웨어, 신경망 모델 등)이 설치되어 있다. 따라서, 사용자 단말(500_1~500_N)은 로그 기록 또는 인게이지먼트 기록을 수집한 후, 사용자 관심사 분석을 위한 어플리케이션를 통해 로그 기록 또는 인게이지먼트 기록을 분석하여 사용자의 취향을 결정할 수 있다. In addition, applications (eg, software, neural network models, etc.) for user interest analysis are installed in the user terminals 500_1 to 500_N. Accordingly, the user terminals 500_1 to 500_N may collect log records or engagement records, and then analyze the log records or engagement records through an application for user interest analysis to determine a user's preference.

일 실시예에서, 사용자 단말(500_1~500_N)은 로그 기록 또는 인게이지먼트 기록을 분석하여 사용자의 행동 정보를 추출하고, 사용자의 행동 정보로부터 컨텐츠의 종류를 결정하기 위한 레이블을 추출할 수 있다. In one embodiment, the user terminals 500_1 to 500_N may extract user behavioral information by analyzing log records or engagement records, and may extract a label for determining the type of content from the user's behavioral information.

다른 일 실시예에서, 사용자 단말(500_1~500_N)은 크롤러, 파서, 인덱서를 구비하여, 사용자가 열람하는 웹 페이지를 수집하고, 웹 페이지에 포함된 이미지 및 아이템명, 가격 등 텍스트 정보에 접근하여 컨텐츠의 종류를 결정하기 위한 레이블을 추출할 수 있다. In another embodiment, the user terminals 500_1 to 500_N are equipped with crawlers, parsers, and indexers to collect web pages viewed by users, and access text information such as images, item names, and prices included in the web pages to extract labels for determining the type of content.

예를 들어, 크롤러는 사용자가 열람하는 웹 주소 목록을 수집하고, 웹사이트를 확인하여 링크를 추적하는 방식으로 아이템 정보와 관련된 데이터를 수집한다. 이때, 파서는 크롤링 과정 중에 수집된 웹 페이지를 해석하여 페이지에 포함된 이미지, 아이템 가격, 아이템명 등 아이템 정보를 추출하며, 인덱서는 해당 위치와 의미를 색인할 수 있다. For example, a crawler collects data related to item information by collecting a list of web addresses browsed by users, checking websites and tracking links. At this time, the parser analyzes the web pages collected during the crawling process and extracts item information such as images, item prices, and item names included in the pages, and the indexer can index the corresponding location and meaning.

상기 컨텐츠의 종류를 결정하기 위한 레이블은 사용자 행동 정보에 포함되는 사용자가 열람한 컨텐츠(예를 들어, 웹 브라우저), 좋아요 태그를 생성한 컨텐츠(예를 들어, 소셜 네트워크)의 이미지, 사용자가 열람한 홈페이지의 이미지 및 텍스트를 기초로 해당 아이템의 의미를 의미한다. The label for determining the type of the content means the meaning of the corresponding item based on the content viewed by the user (e.g., web browser), the image of the content (e.g., social network) for which the like tag is generated, and the image and text of the home page viewed by the user included in the user behavior information.

사용자 단말(500_1~500_N)에는 사용자 열람 기록이 저장되어 있다. 사용자 열람 기록은 로그 기록 및 인게이지먼트 기록을 포함한다. 이때, 로그 기록은 사용자가 사용자 단말(500_1~500_N)의 운영 체제 또는 소프트웨어가 실행 중에 발생하는 이벤트를 기록하여 생성된다. User browsing records are stored in the user terminals 500_1 to 500_N. User browsing records include log records and engagement records. At this time, the log record is generated by recording events that occur while the operating system or software of the user terminals 500_1 to 500_N are being executed by the user.

사용자 단말(500_1~500_N)에는 사용자 열람 기록을 기초로 추출된 컨텐츠의 종류를 결정하기 위한 레이블이 저장되어 있다. 상기 컨텐츠의 종류를 결정하기 위한 레이블은 사용자 행동 정보에 포함되는 사용자가 열람한 컨텐츠(예를 들어, 웹 브라우저), 좋아요 태그를 생성한 컨텐츠(예를 들어, 소셜 네트워크)의 이미지, 사용자가 열람한 홈페이지의 이미지 및 텍스트를 기초로 해당 아이템의 의미를 의미한다. In the user terminals 500_1 to 500_N, a label for determining the type of extracted content based on a user browsing record is stored. The label for determining the type of the content means the meaning of the corresponding item based on the content viewed by the user (e.g., web browser), the image of the content (e.g., social network) for which the like tag is generated, and the image and text of the home page viewed by the user included in the user behavior information.

웹 서비스 제공 서버는 사용자 단말(500_1~500_N)이 어플리케이션을 통해 접속하면 어플리케이션의 종류에 따라 서로 다른 컨텐츠를 제공하는 서버이다. 이러한 웹 서비스 제공 서버(300_1~300_N)는 온라인 쇼핑몰 서버, 검색 엔진 서버 등으로 구현될 수 있다.The web service providing server is a server that provides different contents depending on the type of application when the user terminals 500_1 to 500_N access through the application. These web service providing servers 300_1 to 300_N may be implemented as online shopping mall servers, search engine servers, and the like.

도 2는 본 발명의 일 실시예에 따른 동영상 자동 생성 장치의 내부 구조를 설명하기 위한 도면이다. 2 is a diagram for explaining the internal structure of an apparatus for automatically generating a video according to an embodiment of the present invention.

도 2를 참조하면, 동영상 자동 생성 장치(200)는 스크립트 생성부(210), 시나리오 생성부(220), 키워드 추출부(230), 참조 장면 데이터 송수신부(240), 환경 데이터 생성부(250) 및 영상 합성부(260)를 포함한다. Referring to FIG. 2, the automatic video generation device 200 includes a script generator 210, a scenario generator 220, a keyword extractor 230, a reference scene data transceiver 240, an environment data generator 250, and a video synthesizer 260.

스크립트 생성부(210)는 고객 단말(400_1~400_N)로부터 수신된 영상 생성 참조 정보가 단어 단위의 키워드인 경우 미리 생성된 스크립트 데이터베이스에서 키워드에 해당하는 오브젝트 속성, 오브젝트와 매칭되는 장면의 화면 속성 및 오브젝트와 매칭되는 장면의 상황 속성을 이용하여 스크립트를 생성할 수 있다. When the image generation reference information received from the customer terminals 400_1 to 400_N is a keyword in word units, the script generator 210 may generate a script using object properties corresponding to the keywords in a pre-generated script database, screen properties of scenes matching objects, and situation properties of scenes matching objects.

상기의 실시예에서, 스크립트 생성부(210)는 키워드에 해당하는 오브젝트 속성, 오브젝트와 매칭되는 장면의 화면 속성 및 오브젝트와 매칭되는 장면의 상황 속성 중 고객과 관련된 컨텐츠를 이용한 사용자의 행동 정보를 기초로 결정된 속성과 매칭하는 텍스트를 이용하여 스크립트를 생성할 수 있다. In the above embodiment, the script generator 210 may generate a script using text that matches a property determined based on behavior information of a user using content related to a customer among object properties corresponding to keywords, screen properties of a scene matching the object, and situational properties of a scene matching the object.

시나리오 생성부(220)는 스크립트 생성부(210)에 의해 생성된 스크립트를 기초로 시나리오를 생성한다. 이때, 시나리오는 음향 효과, 분위기 등을 포함할 수 있다. The scenario generator 220 creates a scenario based on the script generated by the script generator 210 . In this case, the scenario may include a sound effect, an atmosphere, and the like.

키워드 추출부(230)는 스크립트 생성부(210)에서 생성된 스크립트에서 키워드를 추출한다.The keyword extraction unit 230 extracts keywords from the script generated by the script generation unit 210 .

보다 구체적으로, 키워드 추출부(230)는 기준 장면 데이터의 스크립트의 텍스트를 공백을 기준으로 단어를 추출하고, 미리 생성된 단어 별 빈도 수 데이터베이스를 기초로 단어의 빈도 수를 측정한다. More specifically, the keyword extractor 230 extracts words based on blank spaces in the script text of the reference scene data, and measures the frequency counts of words based on a pre-generated frequency count database for each word.

그 후, 키워드 추출부(230)는 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성한다. Thereafter, the keyword extractor 230 performs morpheme analysis on each word to generate a token in which a word and a morpheme value are paired and a label indicating a frequency is assigned.

예를 들어, 키워드 추출부(230)는 스크립트의 텍스트를 분석하여 (빈도 수: 1000번, (단어, 형태소 값)), (빈도 수: 234번, (단어, 형태소)), (빈도수: 2541번, (단어, 형태소)), (빈도수: 2516번, (단어, 형태소)) … 등의 토큰을 생성할 수 있다. For example, the keyword extractor 230 analyzes the text of the script (frequency: 1000 times, (word, morpheme value)), (frequency: 234 times, (word, morpheme)), (frequency: 2541 times, (word, morpheme)), (frequency: 2516 times, (word, morpheme))... Tokens can be generated.

상기와 같이, 키워드 추출부(230)는 토큰을 생성한 후 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여한다. As described above, after generating the token, the keyword extractor 230 assigns different weights to each token according to the word of the token and the label of the token.

일 실시예에서, 키워드 추출부(230)는 토큰 각각에 대해서 해당 토큰의 단어를 구현하는 언어의 종류(즉, 영어, 중국어, 한국어 등), 단어가 스크립트의 텍스트에서 존재하는 위치 및 토큰에 할당된 레이블의 빈도 수에 따라 서로 다른 가중치를 부여한다. In one embodiment, the keyword extractor 230 assigns different weights to each token according to the type of language that implements the word of the corresponding token (i.e., English, Chinese, Korean, etc.), the position where the word exists in the text of the script, and the frequency of the label assigned to the token.

먼저, 키워드 추출부(230)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수 및 각각의 토큰의 순서를 이용하여 제1 가중치를 산출한다. First, the keyword extraction unit 230 calculates a first weight by using the total number of tokens generated from the text of the script and the order of each token.

일 실시예에서, 키워드 추출부(230)는 스크립트의 텍스트에서 생성된 전체 토큰의 개수를 기준으로 토큰의 순서가 어느 정도인지 여부 및 언어의 종류에 따라 미리 결정된 중요 값에 제1 가중치를 산출할 수 있다. In one embodiment, the keyword extractor 230 may calculate a first weight for a predetermined important value based on the order of the tokens based on the total number of tokens generated from the text of the script and the type of language.

예를 들어, 키워드 추출부(230)는 전체 토큰의 개수가 12개 이고 토큰의 순서가 4번째인 경우, “0.25”를 산출하고, 언어의 종류에 따라 미리 결정된 중요 값을 반영하여 제1 가중치를 산출할 수 있다. For example, when the total number of tokens is 12 and the order of the tokens is 4, the keyword extractor 230 calculates “0.25” and reflects a predetermined important value according to the type of language to calculate the first weight.

다른 일 실시예에서, 키워드 추출부(230)는 스크립트의 텍스트에서 생성된 토큰 각각에 대해서 토큰에 미리 할당된 레이블이 지시하는 빈도 수와 이전 토큰 및 다음 토큰 각각에 미리 할당된 레이블이 지시하는 빈도 수를 이용하여 제2 가중치를 산출할 수 있다. In another embodiment, the keyword extractor 230 may calculate the second weight by using the number of frequencies indicated by labels pre-assigned to the token and the number of frequencies indicated by labels pre-assigned to each of the previous and next tokens for each token generated from the text of the script.

그 후, 키워드 추출부(230)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다. 상기와 같이, 키워드 추출부(230)는 제1 가중치 및 제2 가중치를 이용하여 최종 가중치를 부여한다.After that, the keyword extraction unit 230 assigns a final weight using the first weight and the second weight. As described above, the keyword extraction unit 230 assigns a final weight using the first weight and the second weight.

참조 장면 데이터 송수신부(240)는 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 참조 영상 데이터 추천 장치(300)에 제공하고, 참조 영상 데이터 추천 장치(300)로부터 참조 영상 데이터를 수신한다. The reference scene data transceiver 240 provides a reference video data recommendation request message including keywords composed of tokens to which different weights are assigned to the reference video data recommendation device 300, and receives the reference video data from the reference video data recommendation device 300.

환경 데이터 생성부(250)는 시나리오에 따라 음향 데이터를 선택하고, 상기 시나리오에 해당하는 텍스트 데이터를 음성 데이터로 변환하고, 상기 시나리오에 따라 AI 배우를 생성할 수 있다. The environment data generating unit 250 may select sound data according to a scenario, convert text data corresponding to the scenario into voice data, and generate an AI actor according to the scenario.

영상 합성부(260)는 참조 장면 데이터 송수신부(240)에서 수신된 참조 장면 데이터 및 환경 데이터 생성부(250)에서 생성된 환경 데이터를 합성하여 영상 데이터를 생성한다. The image synthesis unit 260 synthesizes the reference scene data received from the reference scene data transceiver 240 and environment data generated from the environment data generator 250 to generate image data.

도 3은 본 발명의 일 실시예에 따른 참조 영상 데이터 추천 장치의 내부 구조를 설명하기 위한 도면이다. 3 is a diagram for explaining the internal structure of an apparatus for recommending reference video data according to an embodiment of the present invention.

도 3을 참조하면, 참조 영상 데이터 추천 장치(300)는 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하고, 참조 장면 데이터 각각에 태그를 할당한 후 참조 장면 데이터베이스(330)에 저장한다. 참조 영상 데이터 추천 장치(300)는 영상 데이터 분할부(310), 태그 할당부(320), 참조 장면 데이터베이스(330) 및 참조 영상 데이터 추천부(340)를 포함한다. Referring to FIG. 3 , the reference image data recommendation apparatus 300 collects image data, divides the image data into scene units to generate reference scene data, assigns a tag to each reference scene data, and stores the data in the reference scene database 330. The reference video data recommending device 300 includes an image data dividing unit 310 , a tag allocating unit 320 , a reference scene database 330 and a reference video data recommending unit 340 .

영상 데이터 분할부(310)는 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성한다. The image data division unit 310 collects image data and divides the image data into scene units to generate reference scene data.

일 실시예에서, 영상 데이터 분할부(310)는 영상 데이터로부터 이미지로 디코딩한 후 재생 시간 간격으로 이미지를 샘플링할 수 있다. In an embodiment, the image data dividing unit 310 may sample the image at intervals of reproduction time after decoding the image data into an image.

상기의 실시예에서, 영상 데이터 분할부(310)는 샘플링된 이미지 중 서로 인접한 이미지의 유사도에 기초하여 샘플링된 이미지를 장면 단위로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. 여기에서, 인접한 이미지는 샘플링된 이미지를 영상이 재생되는 시간 순서대로 나열하였을 때 이웃하는 이미지를 의미할 수 있다.In the above embodiment, the image data division unit 310 may generate reference scene data by grouping the sampled images in scene units based on similarities between adjacent images among the sampled images. Here, the adjacent image may refer to a neighboring image when the sampled images are arranged in the order in which the images are reproduced.

예를 들어, 영상 데이터 분할부(310)는 인접한 이미지에 대하여 피쳐 매칭(Feature Matching)을 수행하여 이미지의 유사도를 연산할 수 있다. 가령, 동영상 자동 생성 장치(200)는 인접한 이미지의 특징점을 대조하여 소정 정도 이상의 유사도를 보이는 이미지를 하나의 장면으로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. For example, the image data divider 310 may perform feature matching on adjacent images to calculate a similarity between the images. For example, the automatic video generation apparatus 200 may generate reference scene data by comparing feature points of adjacent images and grouping images having a similarity of a predetermined level or more into one scene.

상기의 실시예에서, 영상 데이터 분할부(310)는 샘플링된 이미지 각각에서 추출되는 오브젝트의 개수 변화를 산출하고, 오브젝트의 개수 변화에 따라 장면이 전환되었다고 판단하여 해당 시점을 기준으로 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the image data divider 310 calculates the change in the number of objects extracted from each sampled image, determines that the scene has changed according to the change in the number of objects, and generates reference scene data based on the corresponding time point.

상기의 실시예에서, 영상 데이터 분할부(310)는 영상 데이터를 구성하는 음성 데이터 및 자막 데이터를 기초로 새로운 내용이 표시되는 시점을 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the video data divider 310 may determine that a time point when new content is displayed is a new scene based on audio data and subtitle data constituting the video data, and may generate reference scene data.

상기의 실시예에서, 영상 데이터 분할부(310)는 샘플링된 이미지 각각에서 추출되는 오브젝트를 추출하고, 오브젝트가 사라졌거나 새로운 오브젝트가 나타나면 새로운 장면이라고 판단하여 참조 장면 데이터를 생성할 수 있다. In the above embodiment, the image data division unit 310 may extract an object extracted from each sampled image, determine that the object is a new scene when the object disappears or a new object appears, and generate reference scene data.

태그 할당부(320)는 참조 장면 데이터를 분석하여 참조 장면 데이터 각각에 태그를 할당한다. The tag allocator 320 analyzes the reference scene data and allocates a tag to each reference scene data.

이를 위해, 태그 할당부(320)는 참조 장면 데이터의 특징을 추출하여 참조 장면 데이터의 특징 정보를 추출하고 특정 정보에 따라 서로 다른 종류의 태그를 할당한다.To this end, the tag allocator 320 extracts features of the reference scene data, extracts feature information of the reference scene data, and allocates different types of tags according to the specific information.

일 실시예에서, 태그 할당부(320)는 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당한다. In one embodiment, the tag allocator 320 extracts object feature information included in the reference scene data, generates object feature information by expressing the object feature information as a vector value, and allocates an object attribute tag according to the object feature information.

보다 구체적으로, 태그 할당부(320)는 오브젝트의 특징 영역을 탐지(Interest Point Detection)할 수 있다. 여기에서, 특징 영역이란, 오브젝트들 사이의 동일 유사 여부를 판단하기 위한 오브젝트의 특징에 대한 기술자, 즉 특징 기술자(Feature Descriptor)를 추출하는 주요 영역을 말한다. More specifically, the tag allocator 320 may detect a feature region of an object (Interest Point Detection). Here, the feature region refers to a main region in which a descriptor for a feature of an object for determining whether objects are identical or similar, that is, a feature descriptor is extracted.

다른 일 실시예에서, 태그 할당부(320)는 참조 장면 데이터의 특징 영역에서 특징 기술자를 추출(Descriptor Extraction)하고, 특징 기술자에 따라 참조 장면 데이터에서 화면 속성 태그를 할당할 수 있다. 특징 기술자는 참조 장면 데이터의 특징들을 벡터 값으로 표현한 것이다. In another embodiment, the tag allocator 320 extracts a feature descriptor from a feature region of the reference scene data (Descriptor Extraction), and allocates a screen property tag from the reference scene data according to the feature descriptor. A feature descriptor expresses features of reference scene data as vector values.

또 다른 일 실예에서, 태그 할당부(320)는 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출하고, 상황의 종류에 따라 상황 속성 태그를 할당한다. 이때, 장면 종류는 각 장면에서 표현되는 상황의 종류를 의미한다. In another example, the tag allocator 320 trains the reference scene data into a scene type analysis model to extract the type of situation expressed in the scene and allocates a situation attribute tag according to the type of situation. At this time, the scene type means the type of situation expressed in each scene.

상기의 실시예에서, 태그 할당부(320)는 장면 종류 분석 모델을 CNN 딥 러닝 모델(CNN Deep Learning Model)로 구축하고, 상술한 데이터 셋을 학습할 수 있다. 이때, CNN 딥 러닝 모델은 두 개의 컨볼루션 레이어, 렐루 레이어, 맥스 풀링 레이어 및 하나의 풀리 커넥티드 레이어를 포함하도록 설계될 수 있다. In the above embodiment, the tag assignment unit 320 may build a scene type analysis model as a CNN deep learning model and learn the above-described data set. In this case, the CNN deep learning model may be designed to include two convolutional layers, a relu layer, a max pooling layer, and one fully connected layer.

상기의 실시예에서, 태그 할당부(320)는 RCNN 기법을 활용하여 CNN에서 산출된 컨볼루션 피쳐 맵(Convolution Feature Maps)의 맵 순서대로 피쳐 시퀀스(Feature Sequence)를 구성한 후, 각 피쳐 시퀀스를 롱 숏 텀 메모리 네트워크(LSTM; Long Short Term Memory networks)에 대입하여 학습할 수 있다.In the above embodiment, the tag allocator 320 constructs feature sequences in the map order of convolution feature maps calculated by CNN using the RCNN technique, and then substitutes each feature sequence into Long Short Term Memory networks (LSTMs) for learning.

또 다른 일 실시예에서, 태그 할당부(320)는 영상 데이터에서 하이라이트 부분을 추출하고, 하이라이트 부분에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당한다. 이때, 하이라이트 부분은 영상 데이터에서 추출된 일부 구간을 의미할 수 있고, 이는 영상 데이터가 직접 지정한 구간이거나 혹은 자동 추출되는 구간일 수 있다. In another embodiment, the tag allocator 320 extracts a highlight part from image data and allocates a highlight attribute tag to reference scene data corresponding to the highlight part. In this case, the highlight part may refer to a partial section extracted from the image data, which may be a section directly designated by the image data or an automatically extracted section.

참조 장면 데이터베이스(330)에는 태그 할당부(320)에 의해 태그가 할당된 참조 영상 데이터가 저장되어 있다. The reference scene database 330 stores reference image data to which tags are allocated by the tag allocator 320 .

참조 영상 데이터 추천부(340)는 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그를 비교하여 유사 점수를 산출하고, 유사 점수가 특정 점수 이상인 태그가 할당된 참조 장면 데이터를 참조 장면 데이터베이스(330)에서 추출한다.The reference image data recommendation unit 340 compares a tag matching the morpheme value of the token among a plurality of tags of the reference scene data to calculate a similarity score, and extracts reference scene data to which a tag having a similarity score equal to or higher than a specific score is assigned, from the reference scene database 330.

일 실시예에서, 참조 영상 데이터 추천부(340)는 참조 장면 데이터베이스(330)에서 추출된 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 참조 장면 데이터를 추출하고, 추출된 참조 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In one embodiment, the reference image data recommender 340 extracts reference scene data to which a tag matching the morpheme value of the token is assigned among a plurality of tags of the reference scene data extracted from the reference scene database 330, and matches the tag of the extracted reference scene data with the word of the token. If they match, the corresponding reference scene data is extracted and provided.

상기의 실시예에서, 참조 영상 데이터 추천부(340)는 토큰의 형태소 값이 명사인 경우 참조 장면 데이터베이스(330)에 추출된 참조 장면 데이터의 복수의 태그 중 오브젝트 속성 태그 및 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In the above embodiment, when the morpheme value of the token is a noun, the reference image data recommendation unit 340 matches the object attribute tag and the word of the token among a plurality of tags of the reference scene data extracted in the reference scene database 330. If they match, the corresponding reference scene data is extracted and provided.

상기의 실시예에서, 참조 영상 데이터 추천부(340)는 토큰의 형태소 값이 형용사인 경우 참조 장면 데이터베이스(330)에 추출된 참조 장면 데이터의 복수의 태그 중 화면 속성 태그 및 상황 속성 태그와 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In the above embodiment, when the morpheme value of a token is an adjective, the reference video data recommendation unit 340 matches a screen property tag and a situation property tag among a plurality of tags of reference scene data extracted from the reference scene database 330 and the words of the token. If they match, the corresponding reference scene data is extracted and provided.

다른 일 실시예에서, 참조 영상 데이터 추천부(340)는 참조 장면 데이터베이스(330)에서 추출된 참조 장면 데이터의 복수의 태그 중 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 참조 장면 데이터에 대해서, 참조 장면 데이터의 복수의 태그 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하고, 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공할 수 있다.In another embodiment, the reference image data recommendation unit 340 may calculate a similarity ratio by matching words of the tokens of the plurality of tags of the reference scene data to the reference scene data to which a tag that does not match the morpheme value of the token is assigned among the plurality of tags of the reference scene data extracted from the reference scene database 330, and may extract and provide reference video data to which a tag having a similarity ratio of a specific score or more is assigned.

상기의 실시예에서, 참조 영상 데이터 추천부(340)는 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하고, 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링수를 기준으로 상기 일치하는 문자의 개수의비율에 따라 상기 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공할 수 있다. In the above embodiment, the reference image data recommendation unit 340 matches a plurality of tags of reference scene data and characters constituting each word of the token to calculate the number of matching characters, compares the number of strings corresponding to the plurality of tags and the number of strings corresponding to the word of the token, calculates the similarity rate according to the ratio of the number of matching characters based on the longer number of strings, and extracts and provides reference image data to which a tag having a similarity rate equal to or higher than a specific score is assigned.

도 4 내지 7은 본 발명의 일 실시예에 따른 동영상 자동 생성 장치를 설명하기 위한 도면이다. 4 to 7 are diagrams for explaining an apparatus for automatically generating a video according to an embodiment of the present invention.

도 4 내지 도 7을 참조하면, 동영상 자동 생성 장치(100)는 고객의 요청에 따라 동영상을 자동으로 생성하기 위해서 영상 데이터(410)를 수집한 후 영상 데이터(410)를 장면 단위로 분할하여 참조 장면 데이터(420_1~420_N)를 생성하고, 참조 장면 데이터(420_1~420_N) 각각에 태그를 할당한 후 참조 장면 데이터베이스(430)에 저장한다. 4 to 7, the automatic video generation apparatus 100 collects video data 410 to automatically generate a video according to a customer's request, divides the video data 410 into scene units, generates reference scene data 420_1 to 420_N, assigns a tag to each of the reference scene data 420_1 to 420_N, and then stores the reference scene data 420_1 to 420_N in the reference scene database 430.

일 실시예에서, 동영상 자동 생성 장치(200)는 영상 데이터(410)로부터 이미지로 디코딩한 후 재생 시간 간격으로 이미지를 샘플링할 수 있다. In one embodiment, the apparatus 200 for automatically generating a video may decode the video data 410 into an image, and then sample the image at playback time intervals.

상기의 실시예에서, 동영상 자동 생성 장치(200)는 샘플링된 이미지 중 서로 인접한 이미지의 유사도에 기초하여 샘플링된 이미지를 장면 단위로 그룹핑하여 참조 장면 데이터를 생성할 수 있다. 여기에서, 인접한 이미지는 샘플링된 이미지를 영상이 재생되는 시간 순서대로 나열하였을 때 이웃하는 이미지를 의미할 수 있다.In the above embodiment, the automatic video generation apparatus 200 may generate reference scene data by grouping the sampled images in scene units based on similarities between adjacent images among the sampled images. Here, the adjacent image may refer to a neighboring image when the sampled images are arranged in the order in which the images are reproduced.

또한, 동영상 자동 생성 장치(200)는 참조 장면 데이터(420_1~420_N)를 분석하여 참조 장면 데이터 각각에 태그를 할당한다. Also, the apparatus 200 for automatically generating a video analyzes the reference scene data 420_1 to 420_N and assigns a tag to each reference scene data.

이를 위해, 동영상 자동 생성 장치(200)는 참조 장면 데이터(420_1~420_N)의 특징을 추출하여 참조 장면 데이터(420_1~420_N)의 특징 정보를 추출하고 특정 정보에 따라 서로 다른 종류의 태그를 할당한다. To this end, the automatic video generation apparatus 200 extracts characteristics of the reference scene data 420_1 to 420_N, extracts feature information of the reference scene data 420_1 to 420_N, and allocates different types of tags according to the specific information.

일 실시예에서, 동영상 자동 생성 장치(200)는 참조 장면 데이터(420_1~420_N)에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당한다. In one embodiment, the automatic video generation apparatus 200 extracts the feature information of an object included in the reference scene data 420_1 to 420_N, generates the object feature information by expressing the object feature information as a vector value, and allocates an object attribute tag according to the object feature information.

예를 들어, 동영상 자동 생성 장치(200)는 도 6(a)의 참조 장면 데이터(420_3))에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 도 6(b)와 같이 생성하고, 오브젝트의 특징 정보에 따라 도 6(a)와 같이 오브젝트 속성 태그를 할당할 수 있다. For example, the automatic video generation apparatus 200 may extract feature information of an object included in the reference scene data 420_3 of FIG. 6 (a), express the object feature information as a vector value, generate the object feature information as shown in FIG. 6 (b), and allocate an object attribute tag as shown in FIG. 6 (a) according to the object feature information.

보다 구체적으로, 동영상 자동 생성 장치(200)는 도 6(b)와 같이 오브젝트의 특징 영역을 탐지(Interest Point Detection)할 수 있다. 여기에서, 특징 영역이란, 오브젝트들 사이의 동일 유사 여부를 판단하기 위한 오브젝트의 특징에 대한 기술자, 즉 특징 기술자(Feature Descriptor)를 추출하는 주요 영역을 말한다. More specifically, the apparatus 200 for automatically generating a video may detect a feature region of an object (Interest Point Detection) as shown in FIG. 6(b). Here, the feature region refers to a main region in which a descriptor for a feature of an object for determining whether objects are identical or similar, that is, a feature descriptor is extracted.

도 8은 본 발명에 따른 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법의 일 실시예를 설명하기 위한 흐름도이다.8 is a flowchart illustrating an embodiment of a method for recommending reference video data for automatically generating a video according to the present invention.

도 8을 참조하면, 참조 영상 데이터 추천 장치(300)는 영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성한다(단계 S810).Referring to FIG. 8 , the apparatus 300 for recommending reference video data collects video data and divides the video data into scene units to generate reference scene data (step S810).

참조 영상 데이터 추천 장치(300)는 참조 장면 데이터를 분석하여 상기 참조 장면 데이터를 학습시키거나 특징 정보를 추출한 후 이를 기초로 상기 참조 장면 데이터 각각에 서로 다른 종류의 태그를 할당한다(단계 S820).The reference image data recommendation apparatus 300 analyzes the reference scene data, learns the reference scene data, or extracts feature information, and then allocates different types of tags to each of the reference scene data based on this (step S820).

참조 영상 데이터 추천 장치(300)는 태그가 할당된 참조 영상 데이터를 참조 영상 데이터베이스에 저장한다(단계 S830).The reference image data recommendation apparatus 300 stores the tag-allocated reference image data in the reference image database (step S830).

참조 영상 데이터 추천 장치(300)는 참조 영상 데이터 추천 요청 메시지를 수신하면 참조 영상 데이터 추천 요청 메시지를 기초로 참조 영상 데이터베이스에서 참조 영상 데이터를 추출하여 제공한다(단계 S840).When receiving the reference video data recommendation request message, the reference video data recommendation device 300 extracts and provides reference video data from the reference video database based on the reference video data recommendation request message (step S840).

단계 S840에 대한 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하고, 상기 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 참조 장면 데이터를 추출하고, 상기 추출된 참조 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공한다.In one embodiment of step S840, the reference video data recommendation device 300 receives a reference video data recommendation request message including a keyword composed of tokens to which different weights are assigned, extracts reference scene data to which a tag matching the morpheme value of the token is assigned among a plurality of tags of the reference scene data, and matches the tag of the extracted reference scene data with the word of the token to extract and provide the corresponding reference scene data.

단계 S840에 대한 다른 일 실시예에서, 참조 영상 데이터 추천 장치(300)는 상기 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하고, 상기 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공하는 단계를 포함한다.In another embodiment of step S840, the reference video data recommendation apparatus 300 receives a reference video data recommendation request message including a keyword composed of tokens to which different weights have been assigned, calculates a similarity ratio by matching a plurality of tags of the reference scene data to which a tag that does not match the morpheme value of the token among a plurality of tags of the reference scene data to which a tag does not match and a word of the token is assigned, and extracts and provides reference video data to which a tag having a similarity ratio of a specific score or more is assigned.

상기의 실시예에서, 참조 영상 데이터 추천 장치(300)는 상기 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하고, 상기 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링수를 기준으로 상기 일치하는 문자의 개수의 비율에 따라 상기 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공한다.In the above embodiment, the reference video data recommendation apparatus 300 matches a plurality of tags of the reference scene data and letters constituting each word of the token to calculate the number of matching letters, compares the number of strings corresponding to the plurality of tags with the number of strings corresponding to the word of the token, calculates the similarity ratio according to the ratio of the number of matching letters based on the longer number of strings, and extracts and provides reference video data to which a tag having a similarity ratio equal to or higher than a specific score is assigned.

한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.Although described by the limited embodiments and drawings, the present invention is not limited to the above embodiments, and those skilled in the art can make various modifications and variations from these descriptions. Therefore, the spirit of the present invention should be grasped only by the claims described below, and all equivalent or equivalent modifications thereof will be said to belong to the scope of the spirit of the present invention.

200: 동영상 자동 생성 장치,
210: 스크립트 생성부,
220: 시나리오 생성부,
230: 키워드 추출부,
240: 참조 장면 데이터 송수신부,
250: 환경 데이터 생성부,
260: 영상 합성부
300: 참조 영상 데이터 추천 장치,
310: 영상 데이터 분할부,
320: 태그 할당부,
330: 참조 장면 데이터베이스,
340: 참조 영상 데이터 추천부
400_1~400_N: 고객 단말,
500_1~500_N: 사용자 단말200: automatic video creation device;
210: script generation unit,
220: scenario generating unit,
230: keyword extraction unit,
240: reference scene data transceiver,
250: environmental data generation unit,
260: video synthesis unit
300: reference image data recommendation device;
310: image data division unit,
320: tag allocation unit,
330: reference scene database;
340: reference image data recommendation unit
400_1~400_N: customer terminal,
500_1~500_N: User terminal

Claims

참조 영상 데이터 추천 장치에서 실행되는 동영상 자동 생성을 위한 참조 영상 데이터 추천 방법에 있어서,
영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하는 단계;
상기 참조 장면 데이터를 분석하여 상기 참조 장면 데이터를 학습시키거나 특징 정보를 추출한 후 이를 기초로 상기 참조 장면 데이터 각각에 서로 다른 종류의 태그를 할당하는 단계; 및
상기 태그가 할당된 참조 영상 데이터를 참조 영상 데이터베이스에 저장하는 단계를 포함하고,
상기 참조 장면 데이터를 분석하여 상기 참조 장면 데이터를 학습시키거나 특징 정보를 추출한 후 이를 기초로 상기 참조 장면 데이터 각각에 서로 다른 종류의 태그를 할당하는 단계는
상기 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당하는 단계;
상기 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출하고, 상황의 종류에 따라 상황 속성 태그를 할당하는 단계; 및
상기 영상 데이터에서 하이라이트 부분을 추출하고, 하이라이트 부분에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당하는 단계를 포함하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 추천 방법.
A reference image data recommendation method for automatically generating a video executed in a reference image data recommendation device,
After collecting the image data, dividing the image data into scene units to generate reference scene data;
analyzing the reference scene data, learning the reference scene data or extracting feature information, and allocating different types of tags to each of the reference scene data based on this; and
Storing reference image data to which the tag is assigned in a reference image database;
Analyzing the reference scene data to learn the reference scene data or extract feature information, and then allocating different types of tags to each of the reference scene data based on this
extracting object feature information included in the reference scene data, generating object feature information by expressing the object feature information as a vector value, and allocating an object attribute tag according to the object feature information;
learning the reference scene data with a scene type analysis model to extract a type of situation represented in the scene and allocating a situation attribute tag according to the type of situation; and
Extracting a highlight part from the image data and allocating a highlight attribute tag to reference scene data corresponding to the highlight part.
A method for recommending reference video data for automatic video generation.

제1항에 있어서,
서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하는 단계;
상기 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 참조 장면 데이터를 추출하는 단계; 및
상기 추출된 참조 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공하는 단계를 더 포함하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 추천 방법.
According to claim 1,
Receiving a reference video data recommendation request message including keywords composed of tokens to which different weights are assigned;
extracting reference scene data to which a tag matching a morpheme value of the token is assigned among a plurality of tags of the reference scene data; and
Further comprising the step of matching the tag of the extracted reference scene data and the word of the token to extract and provide the reference scene data if they match.
A method for recommending reference video data for automatic video generation.

제1항에 있어서,
상기 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하는 단계;
상기 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하는 단계; 및
상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공하는 단계를 포함하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 추천 방법.
According to claim 1,
Receiving a reference video data recommendation request message including a keyword composed of tokens to which different weights have been assigned;
calculating a similarity ratio by matching a plurality of tags of the reference scene data to which a tag that does not match a morpheme value of the token is assigned among a plurality of tags of the reference scene data and a word of the token; and
And extracting and providing reference image data to which a tag having a similarity ratio equal to or higher than a specific score is assigned.
A method for recommending reference video data for automatic video generation.

제3항에 있어서,
상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공하는 단계는
상기 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하는 단계;
상기 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링수를 기준으로 상기 일치하는 문자의 개수의비율에 따라 상기 유사 비율을 산출하는 단계; 및
상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공하는 단계를 포함하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 추천 방법.
According to claim 3,
The step of extracting and providing reference image data to which a tag having a similarity ratio equal to or higher than a specific score is assigned
calculating the number of matched characters by matching a plurality of tags of the reference scene data with characters constituting each word of the token;
comparing the number of strings corresponding to the plurality of tags and the number of strings corresponding to the word of the token, and calculating the similarity ratio according to a ratio of the number of matched characters based on the longer number of strings; and
And extracting and providing reference image data to which a tag having a similarity ratio equal to or higher than a specific score is assigned.
A method for recommending reference video data for automatic video generation.

참조 영상 데이터 제공 장치에 있어서,
영상 데이터를 수집한 후 영상 데이터를 장면 단위로 분할하여 참조 장면 데이터를 생성하는 참조 장면 데이터 추출부;
상기 참조 장면 데이터를 분석하여 상기 참조 장면 데이터를 학습시키거나 특징 정보를 추출한 후 이를 기초로 상기 참조 장면 데이터 각각에 서로 다른 종류의 태그를 할당하는 태그 할당부; 및
상기 태그가 할당된 참조 영상 데이터를 참조 영상 데이터베이스에 저장하는 참조 영상 데이터베이스 구축부를 포함하고,
상기 태그 할당부는
상기 참조 장면 데이터에 포함된 오브젝트의 특징 정보를 추출하고, 오브젝트의 특징 정보를 벡터값으로 표현하여 오브젝트의 특징 정보를 생성하고, 오브젝트의 특징 정보에 따라 오브젝트 속성 태그를 할당하고, 상기 참조 장면 데이터를 장면 종류 분석 모델에 학습시켜 장면에서 표현되는 상황의 종류를 추출하고, 상황의 종류에 따라 상황 속성 태그를 할당하고, 상기 영상 데이터에서 하이라이트 부분을 추출하고, 하이라이트 부분에 해당하는 참조 장면 데이터에 하이라이트 속성 태그를 할당하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 제공 장치.
In the reference image data providing device,
a reference scene data extractor configured to generate reference scene data by collecting image data and dividing the image data into scene units;
a tag assigning unit that analyzes the reference scene data to learn the reference scene data or extract feature information, and then allocates different types of tags to each of the reference scene data based on this; and
A reference image database construction unit configured to store reference image data to which the tag is assigned in a reference image database;
The tag allocator
Extracting feature information of an object included in the reference scene data, generating object feature information by expressing the object feature information as a vector value, assigning an object attribute tag according to the object feature information, learning the reference scene data to a scene type analysis model to extract the type of situation represented in the scene, assigning a situation attribute tag according to the type of situation, extracting a highlight part from the image data, and allocating a highlight attribute tag to the reference scene data corresponding to the highlight part.
Device for providing reference image data for automatic video creation.

제5항에 있어서,
서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하고, 상기 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되는 태그가 할당된 참조 장면 데이터를 추출하고, 상기 추출된 참조 장면 데이터의 태그 및 상기 토큰의 단어를 매칭시켜 일치하면 해당 참조 장면 데이터를 추출하여 제공하는 참조 영상 데이터 추천부를 더 포함하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 제공 장치.
According to claim 5,
A reference video data recommendation unit that receives a reference video data recommendation request message including keywords composed of tokens to which different weights are assigned, extracts reference scene data to which a tag matching a morpheme value of the token is assigned among a plurality of tags of the reference scene data, and matches the tag of the extracted reference scene data with a word of the token to extract and provide the corresponding reference scene data.
Device for providing reference image data for automatic video creation.

제6항에 있어서,
상기 참조 영상 데이터 추천부는
상기 서로 다른 다른 가중치가 부여된 토큰으로 구성된 키워드를 포함하는 참조 영상 데이터 추천 요청 메시지를 수신하고, 상기 참조 장면 데이터의 복수의 태그 중 상기 토큰의 형태소 값과 매칭되지 않은 태그가 할당된 상기 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어를 매칭시켜 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 제공 장치.
According to claim 6,
The reference image data recommendation unit
Receiving a reference video data recommendation request message including keywords composed of tokens to which different weights are assigned, calculating a similarity ratio by matching a plurality of tags of the reference scene data to which a tag that does not match the morpheme value of the token is assigned among a plurality of tags of the reference scene data and a word of the token, and extracting and providing reference video data to which a tag having a similarity ratio of a specific score or more is assigned.
Device for providing reference image data for automatic video creation.

제7항에 있어서,
상기 참조 영상 데이터 추천부는
상기 참조 장면 데이터의 복수의 태그 및 상기 토큰의 단어 각각을 구성하는 문자를 매칭시켜 일치하는 문자의 개수를 산출하고, 상기 복수의 태그에 해당하는 스트링 수 및 상기 토큰의 단어에 해당하는 스트링 수를 비교하여 더 긴 스트링수를 기준으로 상기 일치하는 문자의 개수의비율에 따라 상기 유사 비율을 산출하고, 상기 유사 비율이 특정 점수 이상인 태그가 할당된 참조 영상 데이터를 추출하여 제공하는 것을 특징으로 하는
동영상 자동 생성을 위한 참조 영상 데이터 제공 장치.According to claim 7,
The reference image data recommendation unit
The number of matching characters is calculated by matching a plurality of tags of the reference scene data and characters constituting each word of the token, comparing the number of strings corresponding to the plurality of tags and the number of strings corresponding to the word of the token, calculating the similarity rate according to the ratio of the number of matching characters based on the longer number of strings, and extracting and providing reference image data to which a tag having a similarity rate of a specific score or more is assigned.
Device for providing reference image data for automatic video creation.