KR20120071194A

KR20120071194A - Apparatus of recommending contents using user reviews and method thereof

Info

Publication number: KR20120071194A
Application number: KR1020100132824A
Authority: KR
Inventors: 김상희
Original assignee: 주식회사 케이티
Priority date: 2010-12-22
Filing date: 2010-12-22
Publication date: 2012-07-02

Abstract

PURPOSE: A content recommendation apparatus using user reviews and method thereof are provided to recommend content according to the preference of a user by providing similar content with content which the user is watching. CONSTITUTION: A review searching unit(200) searches one or more review text corresponding to each content based on metadata. A review analysis unit(210) extracts core words through a natural language analysis techniques for the searched review text. The review analysis unit stores the extracted core words in a review analysis DB(DataBase)(240). A review analysis database construction apparatus stores the core data as a feature vector type.

Description

Apparatus of recommending contents using user reviews and method approximately}

본 발명은 컨텐츠를 추천하는 장치 및 그 방법에 관한 것으로서, 보다 상세하게는 사용자 리뷰를 이용하여 현재 시청중인 컨텐츠와 유사한 특성을 가진 컨텐츠를 사용자에게 추천해주는 장치 및 그 방법에 관한 것이다. The present invention relates to an apparatus for recommending content and a method thereof, and more particularly, to an apparatus and method for recommending content having characteristics similar to content currently being viewed using a user review to a user.

종래 VOD(Video on Demand) 검색은 각 컨텐츠의 메타데이터(예를 들어, 장르, 출연배우, 시놉시스, 타이틀, 제작연도 등) 키워드를 비교하여 공통 키워드를 가진 다른 컨텐츠를 찾는 방식이다. IPTV 기반 VOD 컨텐츠 갯수는 약 9만편에 이르고 오픈 IPTV 영향으로 이 숫자는 빠른 속도로 증가될 것이다. 메타데이터는 대량의 컨텐츠에서 특정 컨텐츠를 검색할 때 유용하나, 볼만한 영화나 동영상을 찾고자 하는 사용자는 메타데이터를 기초로 많은 시간을 검색에 소비하여야 한다. 예를 들어, 'A 배우 출연 영화' 컨텐츠는 출연 배우에 'A 배우'를 입력함으로써 검색할 수 있으나 사용자가 각 컨텐츠를 다른 컨텐츠와 구별할 수 있는 키워드를 기억하고 있어야 하는 단점이 있다. 그러나 대부분의 영화에 대해 사용자가 이러한 특성을 기억하지 못하므로 종래 메타데이터를 이용한 키워드 방식의 검색은 한계가 있다. Conventional Video on Demand (VOD) search is a method of searching for other content having a common keyword by comparing metadata (eg, genre, actor, synopsis, title, production year, etc.) of each content. The number of IPTV-based VOD content reaches about 90,000, and this number will increase rapidly due to the impact of open IPTV. Metadata is useful when searching for a specific content in a large amount of content, but a user who wants to find a movie or video to watch should spend a lot of time searching based on the metadata. For example, 'A actor appearing movie' content can be searched by inputting 'A actor' into the actor, but there is a disadvantage in that a user must remember keywords that can distinguish each content from other contents. However, for most movies, the user cannot remember these characteristics, so the keyword-based search using conventional metadata is limited.

또한 사용자의 검색 없이 VOD 리스트를 추천해주는 종래 Collaborative Filtering 기반 추천 방식이 있다. 이 추천 방식은 사용자의 VOD 구매 패턴을 분석하여 비슷한 구매 패턴을 가진 다른 사용자와 비교하여, 현 사용자가 시청하지 않은 VOD 컨텐츠를 추천하는 방식이다. 그러나 이러한 종래 추천 방식은 과거 구매 패턴 위주이므로, 추천된 리스트가 시청자가 현재 보고 싶은 영화가 아닐 가능성이 높을 뿐만 아니라 축척된 과거 구매 패턴이 있어야 이용 가능한 단점이 있다.In addition, there is a conventional Collaborative Filtering-based recommendation method that recommends a VOD list without searching for a user. This recommendation method analyzes the VOD purchase pattern of the user and compares it with other users with similar purchase patterns, and recommends VOD content that the current user has not watched. However, since the conventional recommendation method is mainly based on the past purchase pattern, it is highly likely that the recommended list is not a movie that the viewer wants to watch at present, and there is a disadvantage in that there is a accumulated past purchase pattern.

본 발명이 이루고자 하는 기술적 과제는, 사용자가 현재 시청하고 있는 컨텐츠와 비슷한 특성을 가진 컨텐츠를 추천하여 사용자가 다음 시청할 컨텐츠를 찾는데 소비하는 시간을 줄이고 사용자의 선호도에 맞는 컨텐츠를 추천함으로써 구매 가능성을 높일 수 있는 사용자 리뷰를 이용한 컨텐츠 추천 장치 및 그 방법을 제공하는 데 있다.The technical problem to be achieved by the present invention is to recommend content that has characteristics similar to the content currently being viewed by the user, thereby reducing the time spent by the user to find the next content to be viewed and increasing the likelihood of purchase by recommending content that matches the user's preference. The present invention provides a device for recommending content using a user review and a method thereof.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 리뷰 DB 구축 장치의 일 실시예는, 메타데이터를 기초로 각 컨텐츠에 해당하는 적어도 하나 이상의 리뷰 텍스트를 검색하는 리뷰 검색부; 및 검색된 리뷰 텍스트를 자연어 분석 기법을 통해 핵심단어들을 추출하여 리뷰분석 데이터베이스에 저장하는 리뷰 분석부;를 포함한다.In accordance with one aspect of the present invention, there is provided a review DB building device, including: a review searcher searching for at least one review text corresponding to each content based on metadata; And a review analyzing unit extracting the searched review texts through natural language analysis techniques and storing them in a review analysis database.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 리뷰분석 데이터베이스 구축 방법의 일 실시예는, 컨텐츠 추천 장치가 리뷰분석 데이터베이스를 구축하는 방법에 있어서, 메타데이터를 기초로 각 컨텐츠에 해당하는 적어도 하나 이상의 리뷰 텍스트를 리뷰 사이트에서 검색하여 다운로드하는 단계; 및 다운로드한 리뷰 텍스트를 자연어 분석 기법을 통해 핵심단어를 추출하여 저장하는 리뷰 분석 단계;를 포함한다. In accordance with an aspect of the present invention, there is provided a method for constructing a review analysis database according to an embodiment of the present invention. The method for constructing a review analysis database by a content recommendation apparatus includes: at least one corresponding to each content based on metadata; Searching and downloading the review text from the review site; And a review analysis step of extracting and storing the core word through the downloaded review text through a natural language analysis technique.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 컨텐츠 추천 장치의 일 실시예는, 각 컨텐츠의 리뷰 데이터를 저장하는 리뷰분석 데이터베이스; 현재 시청중인 컨텐츠의 리뷰 데이터와 유사한 리뷰 데이터를 가진 컨텐츠를 상기 리뷰분석 데이터베이스에서 검색하는 컨텐츠 검색부; 및 검색된 컨텐츠를 사용자에게 추천하는 컨텐츠 추천부;를 포함한다.In order to achieve the above technical problem, an embodiment of a content recommendation apparatus according to the present invention includes a review analysis database for storing review data of each content; A content searching unit searching for contents having review data similar to review data of the content currently being viewed in the review analysis database; And a content recommending unit recommending the searched contents to the user.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 컨텐츠 추천 방법의 일 실시예는, 현재 시청중인 컨텐츠의 리뷰 데이터와 유사한 리뷰 데이터를 가진 컨텐츠를 각 컨텐츠의 리뷰 데이터를 저장하고 있는 리뷰분석 데이터베이스에서 검색하는 단계; 및 검색된 컨텐츠를 사용자에게 추천하는 단계;를 포함한다.In order to achieve the above technical problem, an embodiment of a content recommendation method according to the present invention includes a content having review data similar to review data of a content currently being viewed in a review analysis database storing review data of each content. Searching; And recommending the retrieved content to the user.

본 발명에 따르면, 사용자가 현재 시청중인 컨텐츠와 유사한 특성을 가진 컨텐츠를 추천하여 사용자가 다른 컨텐츠를 찾기 위해 소비하는 시간을 줄일 수 있으며, 사용자의 선호도에 맞는 컨텐츠를 추천함으로서 구매 가능성을 높일 수 있다. 또한 리뷰를 이용한 추천 방법은 사용자의 오랜 기간 동안의 컨텐츠 시청 구매 이력 패턴이 없어도 비슷한 컨텐츠를 추천할 수 있는 장점이 있다. According to the present invention, by recommending content having characteristics similar to the content currently being viewed by the user, the user can reduce the time spent searching for other contents, and can increase the likelihood of purchase by recommending content that matches the user's preference. . In addition, the recommendation method using a review has an advantage of recommending similar contents even without a user's long-term purchase history pattern.

도 1은 본 발명에 따른 사용자 리뷰를 데이터베이스화하는 과정의 일 예를 도시한 도면,
도 2는 본 발명에 따른 리뷰분석 데이터베이스 구축 장치의 일 예의 구성을 도시한 도면,
도 3은 본 발명에 따른 리뷰분석 데이터베이스의 구축 방법의 일 예를 도시한 흐름도,
도 4는 본 발명에 따른 사용자 리뷰를 이용한 컨텐츠 추천 과정의 일 예를 도시한 도면,
도 5는 본 발명에 따른 컨텐츠 추천 장치의 일 예의 구성을 도시한 도면, 그리고,
도 6은 본 발명에 따른 컨텐츠 추천 방법의 일 예를 도시한 흐름도이다.1 is a diagram illustrating an example of a process of databaseting a user review according to the present invention;
2 is a view showing the configuration of an example of a review analysis database building apparatus according to the present invention,
3 is a flowchart illustrating an example of a method of building a review analysis database according to the present invention;
4 is a view showing an example of a content recommendation process using a user review according to the present invention;
5 is a diagram illustrating an example of a configuration of a content recommendation apparatus according to the present invention;
6 is a flowchart illustrating an example of a content recommendation method according to the present invention.

이하에서, 첨부된 도면들을 참조하여 본 발명에 따른 사용자 리뷰를 이용한 컨텐츠 추천 장치 및 그 방법에 대해 상세히 설명한다.Hereinafter, an apparatus and method for recommending content using a user review according to the present invention will be described in detail with reference to the accompanying drawings.

컨텐츠간의 유사함(similarity)을 비교할 때 영화의 경우 시놉시스를 주로 사용하나, 시놉시스는 영화가 어떠한 내용인지를 알리는 목적으로 구체적인 스토리 라인 또는 추상적인 은유를 포함한다. 시놉시스는 영화 제작사에게 일괄적으로 배포하므로 동일한 내용의 시놉시스를 가진 다른 영화를 찾기는 어렵지만, 비슷한 시청소감, 즉 리뷰를 가진 영화의 검색은 가능하다.Synopsis is mainly used for movies when comparing similarities between contents, but synopsis includes specific story lines or abstract metaphors for the purpose of indicating what the movie is about. Since synopsis is distributed to film producers in a batch, it is difficult to find other movies with synopsis of the same contents, but it is possible to search for movies with similar viewing feelings or reviews.

예를 들어, 와인 미라클(2008, Bottle Shock) 영화의 시놉시스는 다음과 같다. "역사에 남을 최상의 와인을 꿈꾸는 캘리포니아의 한 포도농원, 자존심 강한 농장주인 '짐'과 철부지 외아들 '보'. 한 모금의 와인으로 품종과 생산 년도까지 맞춰내는 농자의 일꾼 '구스타보'는 파산 직전의 농장을 가까스로 지켜내고 있다. 어느 날 활기차고 아름다운 아가씨 '샘'이 와인 제조 마스터를 꿈꾸며 농장을 찾아오고, 프랑스의 와인샵 프로모터인 '스퍼리에'가 블라인드 테스트에 출품할 와인을 찾아 오면서 와인에 회의적이었던 '보'는 마지막 희망을 꿈꿔 본다. 하지만 황금빛이어야 할 화이트 와인이 출품 직전 브라운 색으로 변해 버리는데..."For example, the synopsis of the Wine Miracle (2008, Bottle Shock) movie is as follows. "A vineyard in California dreaming of the best wines to stay in history, Jim, a proud farmer, and a spoiled son," Bo. " One day, a lively and beautiful lady, Sam, dreams of becoming a wine master and visits the farm. "Bo" dreams of the last hope, but the white wine, which should be golden, turns to brown just before entry. "

인터넷의 리뷰사이트에서 다운로드한 '와인 미라클' 영화의 리뷰는 다음과 같다. "멋진 영화, 와인의 기적을 보여준 영화, 유럽풍의 미국 영화, 잔잔한 영화, 귀여운 영화".Here's a review of a `` Wine Miracle '' movie downloaded from a review site on the Internet: "Awesome movie, a wine miracle movie, a European American movie, a calm movie, a cute movie".

즉, 리뷰는 각종 컨텐츠에 대한 사용자의 시청 소감 등을 적은 글로서 몇 개의 단어로 구성된 짧은 텍스트이다. 따라서 리뷰는 컨텐츠가 영화인 경우 줄거리보다는 영화 전반에 대한 총평이다. In other words, the review is a short text composed of a few words that describes the user's view of various contents and the like. Thus, a review is a general review of the movie rather than a plot when the content is a movie.

따라서 시놉시스에 기재된 줄거리와 비슷한 줄거리의 다른 영화를 검색하기는 어려우나, 리뷰를 이용하면 비슷한 감동을 주는 영화를 검색할 수 있다. 예를 들어, '와인 미라클' 영화는 VOD DB의 메타데이터를 참조할 때 장르가 드라마로 분류되어 있지만 메타 데이터만을 기초로 검색할 때 여러 드라마 중에서 이 드라마와 같이 잔잔한 또는 멋진 감동을 주는 내용인지 슬픔이나 아쉬움을 주는 내용인지 전혀 알 수 없다. 그러나 다른 사용자의 리뷰를 이용하면 비슷한 컨텐츠로서 '비투스' 또는 '어메이징 그레이스' 영화 등을 검색할 수 있어 더 실용적이다.Therefore, it is difficult to search for another movie with a plot similar to the one described in Synopsis, but a review can search for movies with similar inspiration. For example, the "Wine Miracle" movie is classified as a drama when referring to metadata in the VOD DB, but when searching based on metadata only, it is a sad or cool impression like this drama among other dramas. I don't know at all whether it is unfortunate. However, using reviews from other users is more practical because you can search for 'Vitus' or 'Amazing Grace' movies with similar content.

본 발명은 크게 사용자 리뷰를 이용하기 위하여 먼저 리뷰 DB를 구축하는 구성과 구축된 리뷰 DB를 이용하여 컨텐츠를 검색하여 추천하는 구성으로 이루어진다. 이를 차례로 살펴본다.The present invention mainly consists of constructing a review DB in order to use a user review, and a configuration of searching for and recommending content using the constructed review DB. Let's look at them in turn.

도 1은 본 발명에 따른 사용자 리뷰를 데이터베이스화하는 과정의 일 예를 도시한 도면이다.1 is a diagram illustrating an example of a process of databaseting a user review according to the present invention.

도 1을 참조하면, 동일한 타이틀의 다른 컨텐츠(VOD 등)가 존재할 가능성이 있으므로 VOD DB(100)를 통해 해당 컨텐츠에 대한 메타 데이터(예를 들어, 타이틀, 출연배우, 제작자 등)를 검색한 후 이를 기초로 영화 리뷰 사이트(110)에서 해당 컨텐츠에 대한 리뷰를 검색하고 다운로드한다. 예를 들어, 리뷰 검색시 타이틀, 제작년도, 제작자, 출연배우 순으로 비교하여 원하는 리뷰를 다운로드할 수 있다. 여기서 리뷰를 검색하는 영화 리뷰 사이트는 인터넷에서 각종 컨텐츠에 대한 사용자 리뷰가 존재하는 다양한 사이트가 될 수 있다. 해당 컨텐츠에 대한 리뷰를 다운로드한 경우 검색을 용이하게 하기 위하여 각 컨텐츠에 고유 식별자를 부여한 후 이 식별자로 리뷰를 검색할 수 있도록 할 수 있다. 물론 리뷰로도 관련된 컨텐츠를 검색할 수 있도록 리뷰분석 데이터베이스(120)를 구축할 수 있다. Referring to FIG. 1, since there is a possibility that other content (VOD, etc.) of the same title may exist, the metadata (eg, title, actor, producer, etc.) for the corresponding content is retrieved through the VOD DB 100. Based on this, the movie review site 110 searches for and downloads the corresponding content. For example, when searching for reviews, a desired review may be downloaded in order of title, production year, producer, and actor. Here, the movie review site for searching for reviews may be various sites where user reviews of various contents exist on the Internet. When a review for the corresponding content is downloaded, a unique identifier may be assigned to each content to facilitate searching, and then the review may be searched by the identifier. Of course, a review analysis database 120 may be constructed to search for related content as a review.

다운로드한 리뷰는 자연어 처리 기법와 통계 기반 분석방법을 통해 분석된다. 일반적으로 자연어 처리 기법은 리뷰에서 핵심단어를 추출하는데 사용되며, 통계 기반 분석방법은 텍스트를 효율적으로 비교하기 위하여 추출된 핵심단어 등을 특징 벡터(feature vector)로 변환하는데 사용된다. 리뷰에서 추출한 핵심단어는 여러 단어로 구성된 텍스트의 내용을 요약하거나 다른 텍스트와 비교시 사용되는 단어이다.The downloaded reviews are analyzed using natural language processing and statistical analysis. In general, natural language processing techniques are used to extract key words from reviews, and statistical-based analysis methods are used to convert extracted key words into feature vectors for efficient comparison of text. Key words extracted from reviews are words used to summarize the contents of multi-word text or compare it with other text.

키워드 기반의 텍스트 처리 방법은 각 단어의 빈도수(frequency)를 비교하여 핵심단어를 결정한다. 따라서 반복되어 나오는 단어 중 빈도수가 가장 높은 단어를 텍스트의 인덱싱 DB화 할 때 사용한다. 인덱싱은 텍스트 기반으로 정보검색기법을 개발하는데 필수적인 요소이며 빠르고 정확한 검색 서비스를 제공하기 위해서는 효율적인 인덱싱이 필요하다. 예를 들어, '와인 미라클'이라는 영화의 리뷰가 '멋진 영화, 와인의 기적을 보여준 연화, 유럽풍의 미국영화, 잔잔한 영화, 귀여운 영화'라고 할 때 '영화'가 가장 많이 언급되었지만 인덱싱 DB에 쓰일 핵심단어로는 부족하다. 이와 달리 수식어인 '멋진', '와인의 기적' 또는 '잔잔한' 등의 단어가 리뷰 DB에 저장될 핵심단어로 적당하다.The keyword-based text processing method determines key words by comparing the frequency of each word. Therefore, the most frequent words among repeated words are used when indexing DB of text. Indexing is essential for developing text-based information retrieval techniques, and efficient indexing is required to provide fast and accurate retrieval services. For example, when a review of a movie called "Wine Miracle" is called "Awesome Movies, Softening of Wine Miracles, European American Movies, Gentle Movies, Cute Movies", "Movies" is the most mentioned, Key words are not enough. On the contrary, words such as 'cool', 'miracle of wine' or 'tranquil' such as modifier are suitable as key words to be stored in review DB.

또한 키워드 기반의 텍스트 처리 방법은 텍스트의 surface 레벨에서만 분석이 가능하므로 짧은 길이의 텍스트를 비교할 때 적당하지 않다. 웹문서와 달리 리뷰처럼 짧은 길이의 문서는 소수의 키워드로 구성되어 두 개의 텍스트가 동일한 키워드를 포함하고 있을 가능성이 낮기 때문이다. 예를 들어, '기발한 상상력'과 '뛰어난 창작력'은 동일한 내용의 리뷰이지만, 키워드 기반의 처리 방법에서는 공통의 키워드가 없어 비교되지 않는다. 그러나 자연어 처리 기법을 사용하면 키워드의 의미를 기반으로 분석하므로 '상상력'과 '창작력'이 동의어임을 확인하여 동일한 내용으로 검색된다. 동의어 검색할 때, 같은 단어가 여러 의미를 표현할 수 있으므로 특정 단어가 현 텍스트에서 어떠한 용도로 쓰이는지를 분석하여야 한다. 따라서 동의어 추출은 deep-level 분석이 가능한 자연어 처리 기법을 사용한다. 키워드 기반의 또 다른 단점으로는 두 개의 리뷰가 동일한 내용인지 반대 내용인지를 확인할 수 없다. 예를 들어, '기발한 상상력'과 '저질의 상상력'은 반대의 리뷰이지만 상상력이라는 키워드로 검색하면 이 두 개의 리뷰가 동일하게 추출된다.In addition, keyword-based text processing is not suitable for comparing short-length text because it can only be analyzed at the surface level of the text. Unlike web documents, short documents like reviews are made up of a few keywords, so it's unlikely that two texts contain the same keywords. For example, 'wonderful imagination' and 'excellent creativity' are reviews of the same content, but in keyword-based processing methods, there is no common keyword and thus cannot be compared. However, using natural language processing techniques, the analysis is based on the meaning of keywords, so that 'imagination' and 'creativeness' are synonymous words and searched for the same contents. When searching for synonyms, the same word can represent multiple meanings, so we have to analyze what the particular word is used in the current text. Therefore, synonym extraction uses natural language processing that can be deep-level. Another disadvantage of the keyword base is that it can't determine whether two reviews are the same or opposite. For example, 'ingenious imagination' and 'low imagination' are opposite reviews, but searching with the keyword imagination extracts the two reviews in the same way.

따라서 본 발명은 키워드 기반의 텍스트 검색이 아닌 자연어 처리 기법을 사용한다. 자연어 처리 기법은 사람간의 언어 활동을 자동화하여 컴퓨터가 사람과의 상호작용(interaction) 시에 보다 자연적인 방법으로 접근할 수 있도록 하는 것이다. 현재 자연어 처리 기법은 이러한 분성을 높은 정확도와 빠른 속도로 처리할 수 있을 만큼 발달되어 바이오나 웹문서 분석 등에 적용된다. 예를 들어, 컴퓨터가 음성을 인식할 뿐만 아니라 상황에 맞는 인지능력으로 사람과의 상호작용을 대화로 할 수 있도록 한다. 이러한 대화능력을 실생활에서 사용하기 위해서는 대용량의 언어 자원과 복잡한 처리 기법이 필요하고 에러 발생 확률도 높으므로 본 발명은 종래 자연어 처리 기법을 이용하되 좀더 덜 복잡하고 에러 발생이 낮도록 하기 위하여 다음과 같은 과정을 통해 분석한다.Therefore, the present invention uses a natural language processing technique rather than a keyword-based text search. Natural language processing techniques automate language activities between people so that computers can access them in more natural ways when interacting with them. Currently, natural language processing techniques have been developed to process such fractions with high accuracy and high speed, and are applied to bio or web document analysis. For example, the computer not only recognizes the voice, but also enables the user to interact with the person with a cognitive context. In order to use such a conversational ability in real life, a large amount of language resources and complex processing techniques are required, and the probability of error occurrence is high. Analyze through the process.

단계 1: 토큰화(tokenization)Step 1: tokenization

단어를 처리 단위인 토큰으로 나눈다. 예를 들어 '기발한 상상력' 텍스트에서 '기발한'과 '상상력'의 두 개의 토큰으로 분리한다. 토큰화는 단어 사이의 공백을 이용하거나 특수 문자(예를 들어, -,? 등) 등을 이용하여 각 토큰으로 나눈다. Divide a word into tokens, which are processing units. For example, in the text 'Ingenious Imagination', separate it into two tokens: 'Ingenious' and 'Imagination'. Tokenization divides each token using spaces between words or special characters (eg,-,?).

단계 2: 구문 분석(Syntactic analysis)Step 2: Syntactic analysis

토큰화된 각 단어의 품사를 결정한다. 예를 들어, '기발한/adj(형용사)', '상상력/nn(명사)'와 같이 각 토큰화된 단어의 품사를 결정한다. 이러한 구문 분석은 문법 규칙(grammar rules)에만 의존하기보다는 대량의 텍스트를 미리 분석하여 많이 쓰이는 문장의 파싱 포맷(parsing format)들을 구축한 후 통계적인 방법으로 현재 분석중인 텍스트와 가장 비슷한 구조를 가지는 파싱 포맷을 결정하여 구문을 분석한다.Determine the parts of speech of each tokenized word. For example, determine the parts of speech of each tokenized word, such as 'incidental / adj (adjective)' or 'imagination / nn (noun)'. Rather than relying solely on grammar rules, this parsing builds up parsing formats for commonly used sentences by analyzing a large amount of text in advance, and then parses the structure that is most similar to the text currently being analyzed by a statistical method. Determine the format and parse the syntax.

단계 3: 단어 관계 분석(typed denpendency analysis)Step 3: Typed Denpendency Analysis

토큰화된 각 단어의 품사가 결정되면 그 단어들 사이의 관계를 분석한다. 예를 들어, 앞서 분석한 '기발한'과 '상상력'은 '기발한' 단어가 '상상력'을 수식해주는 수식어 관계이다. 따라서 수식어 관계임을 뜻하는 amod를 사용하여 amod('상상력', '기발한')으로 단어 관계를 분석한다. 이러한 단어 관계 분석은 미리 정해진 문법적 관계(grammatic relations)를 이용하여 문장에서 단어 간의 관계를 추출한다. 실시 예에 따라 단어 관계를 미리 수십 개 이상 정의할 수 있다.When parts of tokenized words are determined, the relationship between the words is analyzed. For example, 'quirky' and 'imagination' analyzed above are qualifier relations in which 'quirky' words modify 'imagination'. Therefore, we use amod, which means that it is a modifier relationship, to analyze the word relationship with amod ('imagination', 'wonder'). This word relationship analysis extracts the relationship between words in a sentence using predetermined grammatic relations. According to an embodiment, dozens of word relationships may be defined in advance.

예를 들어, '재미없는 영화'의 리뷰는 먼저 단계 1의 토큰화 과정을 거쳐, '재미없는','영화'로 분리되고, 단계 2에서 각 단계의 품사가 결정된 후 단어 관계 분석 과정을 통해 부정의 표현이 포함되어 있음을 뜻하는 neg를 이용하여 neg(영화, 재미없는')과 같이 분석될 수 있다. For example, the review of 'Funny Movie' is first divided into 'Token' and 'Film' in Step 1, and the parts of speech are determined in Step 2, and then the word relationship analysis process is performed. It can be analyzed as neg (movie, not funny ') using neg, which means that an expression of negation is included.

리뷰 분석은 모폴로지 변환(morphological conversion)(예를 들어, 상상의 => 상상), 동의어 매핑(예, 상상력 vs 창작력), 그리고 핵심단어를 추출하는데 쓰인다(예, '기발한 상상력' => 상상력). 특히 모폴로지 변환과 동의어 매핑은 다양한 리뷰를 정해진 포맷으로 변환하는데 유용하다. Review analysis is used to extract morphological conversions (eg imagination => imagination), synonym mapping (eg imagination vs creativity), and extract key words (eg 'quite imagination' => imagination). In particular, morphology conversion and synonym mapping are useful for converting various reviews into a given format.

도 2는 도 1의 리뷰 데이터베이스화를 위한 본 발명에 따른 리뷰분석 DB의 구축 장치의 일 예의 구성을 도시한 도면이고, 도 3은 리뷰분석 DB의 구축 방법의 일 예를 도시한 흐름도이다. 이하 도 2 및 도 3을 함께 참조하여 설명한다.FIG. 2 is a diagram illustrating an example of a configuration of an apparatus for constructing a review analysis DB according to the present invention for forming a review database of FIG. 1, and FIG. 3 is a flowchart illustrating an example of a method for constructing a review analysis DB. Hereinafter, a description will be given with reference to FIGS. 2 and 3.

먼저 도 2를 참조하면, 리뷰분석 DB 구축 장치는 리뷰사이트(230)와 연결된리뷰 검색부(200), 리뷰 분석부(210), VOD DB(220) 및 리뷰분석 DB(240)를 포함한다.First, referring to FIG. 2, the apparatus for constructing a review analysis DB includes a review search unit 200 connected to a review site 230, a review analyzer 210, a VOD DB 220, and a review analysis DB 240.

VOD DB(220)는 각종 컨텐츠 및 그에 대한 메타데이터(타이틀, 출연배우, 제작자, 제작자, 시놉시스, 제작연도, 장르 등)를 저장하는 데이터베이스이며, 리뷰분석 DB(240)는 분석된 리뷰를 저장하는 데이터베이스이다. The VOD DB 220 is a database that stores various contents and metadata thereof (title, actor, producer, producer, synopsis, production year, genre, etc.), and the review analysis DB 240 stores the analyzed reviews. Database.

리뷰 검색부(200)는 각종 컨텐츠에 대한 리뷰를 리뷰 사이트(230)에게 검색하여 다운로드한다(S300). 이때 동일한 타이틀의 컨텐츠가 다수 존재할 수 있으므로 단순히 컨텐츠 타이틀만으로 검색하는 것이 아니라 리뷰 검색부(200)는 우선 VOD DB(220)를 통해 해당 컨텐츠의 메타 데이터를 검색한 후 그 메타 데이터를 기초로 리뷰 사이트(230)의 해당 컨텐츠에 대한 리뷰를 검색하여 다운로드한다. The review search unit 200 searches for and downloads reviews of various contents to the review site 230 (S300). In this case, since a plurality of contents of the same title may exist, the search search unit 200 first searches the metadata of the corresponding content through the VOD DB 220 and then reviews the site based on the metadata instead of simply searching for the content title alone. Search for and download the review of the corresponding content (230).

리뷰 분석부(210)는 리뷰 검색부(200)를 통해 다운로드된 각 컨텐츠의 리뷰를 도 1에서 살펴본 자연어 처리 기법을 통해 분석하고 핵심단어를 추출한다(S310). 즉 리뷰 분석부(210)는 각 컨텐츠의 리뷰 텍스트를 각 토큰단위로 분리하고 품사를 결정한 후 각 단어의 관계를 분석하여 핵심단어를 추출하고 리뷰분석 DB(240)에 저장한다. 이때 리뷰 분석부(210)는 필요에 따라 사전이나 동의어 DB 등을 이용하여 분석된 핵심단어의 동의어 매핑 및 모폴로지 변환 등을 통해 일정한 포맷으로 리뷰를 저장할 수 있다. The review analyzing unit 210 analyzes the reviews of each content downloaded through the review searching unit 200 through the natural language processing technique described in FIG. 1 and extracts key words (S310). That is, the review analyzing unit 210 separates the review text of each content by each token unit, determines a part-of-speech, analyzes the relationship of each word, extracts key words, and stores them in the review analysis DB 240. In this case, the review analyzer 210 may store the review in a predetermined format through synonym mapping and morphology conversion of core words analyzed using a dictionary or a synonym DB as necessary.

하나의 컨텐츠에 둘 이상의 리뷰가 검색되어 다운로드될 수 있으며 이때 리뷰 분석부(210)는 다른 컨텐츠와 차별성이 더 많은 리뷰에 더 많은 가중치를 부여한다. 이러한 가중치 부여는 통계기반 분석방법을 이용한다. 즉, 리뷰 분석부(210)는 분석된 리뷰에 대한 핵심단어를 추출한 후 전체 리뷰분석 DB(240)에서 빈도수를 확인하여 빈도수가 높은 단어는 낮은 가중치를 부여하고 낮은 단어는 높은 가중치를 부여하여 다른 컨텐츠와 차별성이 더 많은 리뷰에 가중치를 부여할 수 있도록 한다. More than one review may be searched and downloaded in one content, and the review analyzer 210 may give more weight to reviews that are more different from other content. This weighting uses statistical analysis. That is, the review analysis unit 210 extracts key words for the analyzed review and checks the frequency in the entire review analysis DB 240 so that words with high frequency are given low weights and low words with high weights to give other Allow content and differentiation to weight more reviews.

리뷰 분석부(210)는 추후 비슷한 리뷰의 검색과 분석이 용이하도록 리뷰 분석, 핵심단어 추출, 가중치 부여 등을 통해 파악한 각 리뷰의 특징을 특징 벡터(feature) 형태로 저장한다. 물론 VOD DB(220) 또한 특징 벡터 형태로 저장되는 것이 바람직하다. 특징 벡터는 특정 객체를 표현하는 n 개의 특징을 n 차원으로 표시한 벡터로서 객체의 특징 비교 등에 널리 사용되는 벡터로서 이는 본 발명의 범위를 벗어나므로 이에 대한 상세한 설명은 생략한다. The review analysis unit 210 stores the features of each review identified through review analysis, key word extraction, weighting, etc. in the form of a feature vector to facilitate later search and analysis of similar reviews. Of course, the VOD DB 220 is also preferably stored in the form of a feature vector. The feature vector is a vector expressing n features representing a specific object in n dimensions and is widely used for comparing features of an object, which is outside the scope of the present invention, and thus a detailed description thereof will be omitted.

도 4는 본 발명에 따른 사용자 리뷰를 이용한 컨텐츠 추천 과정의 일 예를 도시한 도면이다.4 is a diagram illustrating an example of a content recommendation process using a user review according to the present invention.

도 4를 참조하면, 사용자가 현재 시청중인 컨텐츠에 대한 메타데이터(400)를 기초로 VOD DB(410)를 검색하여 특징이 유사한 컨텐츠를 검색하고 또한 현재 시청중인 컨텐츠에 대한 리뷰(420)를 기초로 리뷰분석 DB(430)를 검색하여 특징이 유사한 컨텐츠를 검색한다. 메타데이터 및 리뷰를 기초로 각각 검색한 결과를 통합하여 유사도를 결정하되, 이때 리뷰를 기초로 검색한 결과에 더 높은 가중치를 두어 두 검색 결과의 유사도를 통합한다. 그리고 두 검색 결과의 통합 유사도가 미리 설정된 일정 임계치 이상인 컨텐츠를 추천 컨텐츠(440)로 파악한다. 이와 같이 사용자가 시청중인 컨텐츠와 유사한 컨텐츠를 추천하는 경우, 통계적으로 25%의 사용자가 24시간 내에 추천된 컨텐츠를 구매한다고 한다. Referring to FIG. 4, a user searches a VOD DB 410 based on metadata 400 of content currently being viewed, searches for content having similar characteristics, and also based on a review 420 of content currently being viewed. By searching the review analysis DB (430) to search for content with similar features. The similarity is determined by integrating the search results based on metadata and reviews, but the similarity of the two search results is integrated by giving a higher weight to the search results based on the reviews. In addition, the content having the integrated similarity between the two search results or more is set as the recommended content 440. As such, when a user recommends content similar to the content being viewed, statistically, 25% of users purchase the recommended content within 24 hours.

도 5는 도 4에 도시된 컨텐츠 추천 과정을 위한 장치의 일 예를 도시한 도면이며, 도 6은 이에 대한 방법의 일 예를 도시한 흐름도이다. 이하, 도 5 및 도 6을 함께 참조하여 설명한다.FIG. 5 is a diagram illustrating an example of an apparatus for a content recommendation process illustrated in FIG. 4, and FIG. 6 is a flowchart illustrating an example of a method thereof. Hereinafter, a description will be given with reference to FIGS. 5 and 6.

도 5를 참조하면, 컨텐츠 추천 장치는 리뷰분석 DB(500), VOD DB(510), 컨텐츠 검색부(520) 및 컨텐츠 추천부(530)를 포함한다.Referring to FIG. 5, the content recommendation apparatus includes a review analysis DB 500, a VOD DB 510, a content search unit 520, and a content recommendation unit 530.

리뷰분석 DB(500)는 도 1 내지 도 3에서 설명한 리뷰 분석 과정을 통해 각 컨텐츠에 대한 리뷰를 분석하여 핵심단어를 추출하여 저장한 데이터베이스로서, 분석 결과를 통계적 분석방법을 이용하여 특징 벡터 형태로 저장한다. 그리고 VOD DB(510)는 각종 컨턴츠에 대한 메타 데이터를 특징 벡터 형태로 저장한다.The review analysis DB 500 is a database that analyzes reviews for each content through the review analysis process described with reference to FIGS. 1 to 3, extracts and stores key words, and analyzes the results in a feature vector form using a statistical analysis method. Save it. The VOD DB 510 stores metadata about various contents in the form of a feature vector.

컨텐츠 검색부(520)는 현재 사용자가 시청중인 컨텐츠에 대한 메타데이터 및 리뷰를 파악한 후 이를 기초로 각각 VOD DB(510) 및 리뷰분석 DB(500)를 검색하여 유사한 컨텐츠를 파악한다(S600, S610)). 컨텐츠 검색부(520)는 현재 시청중인 컨텐츠에 대한 메타 데이터 및 리뷰를 VOD DB(500) 및 리뷰분석 DB(510)에서 검색하거나 또는 실시간 리뷰 사이트로부터 다운로드하여 리뷰를 검색하고 분석할 수 있다. 컨텐츠 검색부(520)는 VOD DB(510) 및 리뷰분석 DB(500)가 특징 벡터 형태로 데이터를 저장하고 있는 현 시청중인 컨텐츠의 메타데이터 및 리뷰에 대한 특징 벡터를 기초로 VOD DB(510) 및 리뷰분석 DB(500)의 특징 벡터와 비교하여 유사한 컨텐츠를 검색한다.The content retrieval unit 520 grasps metadata and reviews about the content currently being viewed by the user, and then searches for the VOD DB 510 and the review analysis DB 500 based on this to identify similar contents (S600 and S610). )). The content search unit 520 may search for metadata and reviews about the content currently being viewed in the VOD DB 500 and the review analysis DB 510 or download the data from a real-time review site to search and analyze the reviews. The content retrieval unit 520 uses the VOD DB 510 based on the feature vector for the metadata and the review of the content currently being viewed in which the VOD DB 510 and the review analysis DB 500 store data in the form of a feature vector. And similar content is searched in comparison with the feature vector of the review analysis DB 500.

컨텐츠 추천부(530)는 컨텐츠 검색부(520)가 각각 메타 데이터 및 리뷰를 기초로 검색한 유사도 결과를 통합하되, 리뷰를 기초로 검색한 결과에 더 높은 가중치를 부여하여 두 결과를 통합한다(S620). 그리고 컨텐츠 추천부(530)는 메타 데이터 및 리뷰의 검색 결과의 통합 유사도가 미리 설정된 일정 임계치인 경우(S630) 이를 추천 컨텐츠로 제공한다(S640). 이때 일정 임계치 이상이 다수의 컨텐츠가 존재하는 경우 유사도가 높은 순으로 추천 컨텐츠 리스트를 만들어 사용자에게 제공한다.The content recommender 530 integrates the similarity results searched by the content searcher 520 based on the metadata and the reviews, respectively, and gives the weighted results to the searched results based on the review, thereby integrating the two results. S620). The content recommendation unit 530 provides the recommended content when the integrated similarity of the search results of the metadata and the review is a predetermined threshold value (S630). In this case, when a plurality of contents having a predetermined threshold or more exist, a list of recommended contents is created in order of high similarity and provided to the user.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can also be embodied as computer-readable codes on a computer-readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment for the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

A review search unit searching for at least one review text corresponding to each content based on metadata; And
And a review analyzing unit extracting the searched review texts through a natural language analysis technique and storing the searched words in a review analysis database.

The method of claim 1, wherein the review analysis unit,
A review analysis database construction apparatus comprising: separating review text for each word as a processing unit, determining a part-of-speech for each word, analyzing the relationship between each word, and extracting the key words and storing them in the form of a feature vector.

The method of claim 2, wherein the review analysis unit,
And a parsing format constructed by analyzing a large amount of text, and parsing the review text, and analyzing a relationship between words using a grammatical relationship.

The method of claim 1, wherein the review analysis unit,
A review analysis database building apparatus, characterized in that a low weight is assigned to a high frequency key word and a high weight is given to a low frequency key word based on the frequency of the key words in the review analysis database.

In the method for the content recommendation device to build a review analysis database,
Searching for and downloading at least one review text corresponding to each content based on the metadata from the review site;
And a review analysis step of extracting and storing the key words through the natural language analysis technique of the downloaded review text.

The method of claim 5, wherein the review analysis step,
Separating the downloaded review text by each word through a natural language analysis technique, determining a part-of-speech of each word, analyzing the relationship between each word, and extracting the core words and storing them in the form of a feature vector; How to build an analytics database.

The method of claim 6, wherein the review analysis step,
Parsing the review text using a parsing format constructed by analyzing a large amount of text, and analyzing a relationship between words using a grammatical relationship.

The method of claim 5, wherein the review analysis step,
And assigning low weights to key words with high frequency and high weights to key words with low frequency based on the frequency of key words in the review analysis database.

A review analysis database for storing review data of each content;
A content searching unit searching for contents having review data similar to review data of the content currently being viewed in the review analysis database; And
And a content recommending unit for recommending the searched contents to the user.

The method of claim 9,
The content searching unit compares the similarity between the review data for the content currently being viewed in the review analysis database and the review data for the other contents through a natural language processing technique, and searches for the content having the similarity level more than a predetermined value. Content recommendation device.

The method of claim 9,
It further includes a VOD database that stores metadata about each content,
The content search unit searches for the similar content by searching the review analysis database and the VOD database based on review data and metadata for the content currently being viewed, but the search results through the review analysis database are provided through the VOD database. Content recommending device, characterized in that the weight higher than the search results.

Searching for a content having review data similar to the review data of the content currently being viewed in a review analysis database storing review data of each content; And
Recommending the retrieved content to a user.

The method of claim 12, wherein the searching comprises:
And comparing the similarity of the review data for the content currently being viewed in the review analysis database with the review data for the other contents through a natural language processing technique, and searching for the content having the similarity equal to or more than a predetermined value. How to recommend content.

The method of claim 12, wherein the searching comprises:
A first search step of searching for similar content by comparing the review data of the content currently being viewed and the review data of the review analysis database;
A second search step of searching for similar content by comparing metadata about the content currently being viewed and metadata about each content stored in the VOD database;
Adding two search results after assigning a weight to a result of the first search step higher than a result of the second search step; And
And extracting content whose summed result is equal to or greater than a predetermined threshold value.