KR101501610B1

KR101501610B1 - System and method for recognition quote/speaker

Info

Publication number: KR101501610B1
Application number: KR1020090052872A
Authority: KR
Inventors: 이현수; 이해웅; 한형동
Original assignee: 네이버 주식회사
Priority date: 2009-06-15
Filing date: 2009-06-15
Publication date: 2015-03-12
Anticipated expiration: 2029-06-15
Also published as: KR20100134312A

Abstract

인용문/화자 인식 시스템 및 방법이 개시된다. 인용문/화자 인식 시스템은 문서를 분석하여 인용문 및 화자를 인식하는 인용문/화자 인식부, 상기 인용문에 대해 상기 인식한 화자를 태깅하는 화자 태깅부 및 상기 화자에 대한 검색 요청을 수신하면 상기 화자가 발화한 인용문을 검색하여 검색 결과로 제공하는 검색 결과 제공부를 포함할 수 있다.A quotation / speaker recognition system and method are disclosed. A citation / speaker recognition system includes a citation / speaker recognition unit for analyzing a document and recognizing a citation and a speaker, a speaker tagging unit for tagging the recognized speaker with respect to the citation, and a speaker And a search result providing unit for searching for a citation and providing the search result as a search result.

인용문, 화자, 문서, 문장 구조, 형태소 분석기, 문장 패턴 Quotation, speaker, document, sentence structure, morpheme analyzer, sentence pattern

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system and method for recognizing a speaker,

본 발명은 문장으로부터 인용문/화자 인식 시스템 및 방법에 관한 것으로, 보다 자세하게는, 형태소 분석, 패턴 처리를 통해 문장으로부터 인용문을 인식하고, 인용문을 발화한 화자를 인식하는 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for recognizing a citation / speaker from a sentence, and more particularly, to a system and method for recognizing a citation from a sentence through morphological analysis and pattern processing, and recognizing a speaker who uttered the citation.

인터넷 기술이 발전하면서, 문서로부터 특정 정보를 추출하여 사용자에게 제공하는 서비스가 많이 나타나고 있다. 특히, 문서를 분석함으로써 사용자에게 보다 유용한 정보를 제공할 수 있는 서비스 발굴이 요구되고 있다.As Internet technology advances, there are many services that extract specific information from a document and provide it to users. Especially, it is required to discover services that can provide more useful information to users by analyzing documents.

이와 관련하여, 뉴스와 같은 문서는 속보성, 시의성, 사회성을 이루는 5W 1H의 뉴스 프레임워크 중 누가(WHO) /언제(WHEN) / 어디서(WHERE)에 대한 속성과 뉴스로부터 이슈를 추출하는 기술이 요구되고 있다.In this regard, documents such as news are a technology that extracts issues from WHO / WHERE / WHERE attributes and news among news frameworks of 5W 1H that are breaking news, timeliness, and sociality Is required.

구체적으로, 뉴스와 같은 문서는 누가 어떤 내용을 발화했다는 형식으로 구성되어 있기 때문에, 문서에 인용문(quote)과 인용문을 발화한 화자(speaker)가 기재되어 있는 경우가 많다. 그러나, 문서는 다양한 단어들로 기재되어 있기 때문에, 문서 구조를 쉽게 파악할 수 있는 사람이 아닌 컴퓨터 장치 등이 뉴스로부터 인용문과 인용문을 발화한 화자를 인식하는 것은 용이하지 않다. Specifically, a document such as news is composed of a form in which a person has uttered a certain content, so that a quote and a speaker that utter a quotation are often described in the document. However, since the document is described in various words, it is not easy to recognize a speaker who uttered a quotation and a quotation from a news source such as a computer apparatus, not a person who can easily grasp the document structure.

특히, 문서는 항상 고정된 패턴으로 작성되지 않기 때문에, 다양한 문서들로부터 인용문과 화자를 효율적이고 정확하게 인식하는 방법이 요구된다.In particular, since a document is not always created in a fixed pattern, a method of efficiently and accurately recognizing a quotation and a speaker from various documents is required.

본 발명은 문장 분석 접근 방법과 패턴 인식 처리 방법을 통해 인용문 및 인용문을 발화한 화자를 인식함으로써 정확도를 향상시킬 수 있는 인용문/화자 인식 시스템 및 방법을 제공한다.The present invention provides a citation / speaker recognition system and method capable of improving accuracy by recognizing a speaker who has uttered a citation and a quotation through a sentence analysis approach and a pattern recognition processing method.

본 발명은 패턴 처리 사전을 구축하여 텍스트 토큰을 추출하고, 텍스트 토큰에 기초하여 문장의 인용문 및 화자를 인식함으로써 인용문과 화자를 인식하는 정확성을 향상시키는 인용문/화자 인식 시스템 및 방법을 제공한다.The present invention provides a citation / speaker recognition system and method for building a pattern processing dictionary to extract text tokens and improving the accuracy of recognizing citation and speaker by recognizing the citation and the speaker of the sentence based on the text token.

본 발명은 형태소 분석기를 이용하여 인용문 및 화자를 인식함으로써 인용문과 화자를 인식하는 정확성을 향상시키는 인용문/화자 인식 시스템 및 방법을 제공한다.The present invention provides a citation / speaker recognition system and method that improves the accuracy of recognizing quotations and speakers by recognizing quotations and speakers using morpheme analyzers.

본 발명은 형태소 분석기를 이용하여 인용문 및 화자를 인식하며, 화자와 인용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅함으로써, 문장에 복수의 인용문 및 화자를 포함하는 경우에도 인용문과 화자를 정확하게 매칭할 수 있는 인용문/화자 인식 시스템 및 방법을 제공한다.The present invention recognizes a quotation and a speaker by using a morpheme analyzer, and when a sentence includes a plurality of quotations and a speaker by tagging a speaker with respect to a quotation based on a predetermined sentence pattern indicating a relationship between the speaker and a quotation A citation / speaker recognition system and method capable of accurately matching a citation and a speaker are provided.

본 발명은 형태소 분석기를 이용하여 인용문 및 화자를 인식하며, 문장에 포함된 질문 키워드와 화자와 인용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅함으로써, 문장에 복수의 인용문 및 화자를 포함하고 예외적인 상황이더라도 인용문과 화자를 정확하게 매칭할 수 있는 인용문/화자 인식 시스템 및 방법을 제공한다.The present invention recognizes a quotation and a speaker by using a morpheme analyzer, tags a speaker on a quotation on the basis of a question keyword included in the sentence, a preset sentence pattern indicating a relationship between the speaker and the quotation, And a speaker / speaker recognition system and method capable of accurately matching a speaker with a speaker even in exceptional situations.

본 발명은 인용문 및 화자를 인식하는 다양한 인식 방법을 조합하거나, 각각의 인식 방법에 따라 인용문 및 화자를 인식한 후 오류 정도에 따라 어느 하나의 인식 방법을 선택함으로써 문장으로부터 인용문 및 화자를 인식하는 정확도를 향상시킬 수 있는 인용문/화자 인식 시스템 및 방법을 제공한다.According to the present invention, it is possible to combine a plurality of recognition methods for recognizing a citation and a speaker, or to recognize a citation and a speaker from each sentence by selecting one of the recognition methods according to the degree of the error, A speaker / speaker recognition system and a speaker recognition method capable of improving a speaker / speaker recognition system.

본 발명의 일실시예에 따른 인용문/화자 인식 시스템은 문서를 분석하여 인용문 및 화자를 인식하는 인용문/화자 인식부, 상기 인용문에 대해 상기 인식한 화자를 태깅하는 화자 태깅부 및 상기 화자에 대한 검색 요청을 수신하면 상기 화자가 발화한 인용문을 검색하여 검색 결과로 제공하는 검색 결과 제공부를 포함할 수 있다.A citation / speaker recognition system according to an embodiment of the present invention includes a citation / speaker recognition unit for analyzing a document and recognizing a citation and a speaker, a speaker tagging unit for tagging the recognized speaker with respect to the citation, And a search result providing unit for searching the cited talk sentence of the speaker upon receiving the request and providing the search result as a search result.

본 발명의 일실시예에 따르면, 상기 인용문/화자 인식부는 상기 문서로부터 문장을 추출하여 상기 문장에서 인용문을 인식하는 인용문 인식부, 상기 문장을 분석하여 상기 인용문의 화자 후보를 결정하는 화자 후보 결정부, 상기 문장을 분석하여 상기 화자 후보 중 상기 인용문을 발화한 화자를 선택하는 화자 선택부 및 상기 선택된 화자의 고유한 이름을 인식하는 화자 인식부를 포함할 수 있다.According to an embodiment of the present invention, the citation / speaker recognition unit includes a citation recognition unit for extracting a sentence from the document and recognizing a citation in the sentence, a speaker candidate determination unit for analyzing the sentence to determine a speaker candidate for the citation, A speaker selection unit for analyzing the sentence to select a speaker who has uttered the quotation from among the speaker candidates, and a speaker recognition unit for recognizing the unique name of the selected speaker.

본 발명의 일실시예에 따른 인용문/화자 인식 방법은 문서를 분석하여 인용문 및 화자를 인식하는 단계, 상기 인용문에 대해 상기 인식한 화자를 태깅하는 단계 및 상기 화자에 대한 검색 요청을 수신하면 상기 화자가 발화한 인용문을 검색하여 검색 결과로 제공하는 단계를 포함할 수 있다.A method of recognizing a citation / speaker according to an embodiment of the present invention includes analyzing a document to recognize a citation and a speaker, tagging the recognized speaker with respect to the citation, and receiving a search request for the speaker, And providing the retrieved result as a search result.

본 발명의 일실시예에 따르면, 문장 분석 접근 방법과 패턴 인식 처리 방법을 통해 인용문 및 인용문을 발화한 화자를 인식함으로써 정확도를 향상시킬 수 있는 인용문/화자 인식 시스템 및 방법이 제공된다.According to an embodiment of the present invention, a citation / speaker recognition system and method capable of improving the accuracy by recognizing a speaker who uttered a citation and a quotation through a sentence analysis approach and a pattern recognition processing method are provided.

본 발명의 일실시예에 따르면, 인용문 및 화자를 인식하는 다양한 인식 방법을 조합하거나, 각각의 인식 방법에 따라 인용문 및 화자를 인식한 후 오류 정도에 따라 어느 하나의 인식 방법을 선택함으로써 문장으로부터 인용문 및 화자를 인식하는 정확도를 향상시킬 수 있는 인용문/화자 인식 시스템 및 방법이 제공된다.According to an embodiment of the present invention, various cognitive methods for recognizing a speaker and a speaker can be combined, or a citation and a speaker can be recognized according to respective recognition methods, and a citation can be selected from a sentence And a citation / speaker recognition system and method capable of improving the accuracy of recognizing the speaker.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명에 따른 실시예를 상세하게 설명할 수 있다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. Like reference symbols in the drawings denote like elements.

도 1은 본 발명의 일실시예에 따른 인용문/화자 인식 시스템의 전체 구성을 도시한 블록 다이어그램이다.FIG. 1 is a block diagram illustrating an entire configuration of a citation / speaker recognition system according to an embodiment of the present invention. Referring to FIG.

도 1을 참고하면, 인용문/화자 인식 시스템(100)은 인용문/화자 인식부(101), 화자 태깅부(102) 및 검색 결과 제공부(103)를 포함할 수 있다.Referring to FIG. 1, the citation / speaker recognition system 100 may include a citation / speaker recognition unit 101, a speaker tagging unit 102, and a search result providing unit 103.

인용문/화자 인식부(101)는 문서(108)를 분석하여 인용문 및 화자를 인식할 수 있다. 이 때, 문서(108)는 적어도 하나의 인용문(Quote) 및 화자(Speaker)를 포함할 수 있다. 일례로, 문서(108)는 인용문과 화자를 포함하는 뉴스 기사일 수 있다.The quotation / speaker recognition unit 101 can analyze the document 108 and recognize the quotation and the speaker. At this time, the document 108 may include at least one Quote and Speaker. In one example, document 108 may be a news article that includes quotes and speakers.

일례로, 인용문/화자 인식부(101)는 제1 인식 방법 내지 제4 인식 방법 중 어느 하나 또는 이들을 조합하여 문장에 적용함으로써 인용문 및 화자를 인식할 수 있다.For example, the quotation / speaker recognition unit 101 can recognize a citation and a speaker by applying any one of the first recognition method to the fourth recognition method or a combination thereof to a sentence.

이 때, 제1 인식 방법은 패턴 처리 사전을 구축하여 텍스트 토큰을 추출하고, 텍스트 토큰에 기초하여 문장의 인용문 및 화자를 인식하는 것을 의미할 수 있다. 그리고, 제2 인식 방법은 형태소 분석기를 이용하여 문장으로부터 품사 정보를 추출하고, 품사 정보에 기초하여 문서(108)의 인용문 및 화자를 인식하는 것을 의미할 수 있다.At this time, the first recognition method may mean constructing a pattern processing dictionary to extract a text token, and recognize a citation and a speaker of the sentence based on the text token. The second recognition method may be to extract the parts of speech information from the sentence using the morpheme analyzer and recognize the citation and the speaker of the document 108 based on the part of speech information.

또한, 제3 인식 방법 및 제4 인식 방법은 문장에 복수의 인용문이 포함된 경우, 형태소 분석기를 이용하여 문장으로부터 품사 정보를 추출하고, 품사 정보에 기초하여 문서(108)의 인용문 및 화자를 인식할 수 있다. 이 때, 하며, 제3 인식 방법은 화자와 인용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅하는 것을 의미하고, 제4 인식 방법은 문장에 포함된 질문 키워드와 화자와 인용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅하는 것을 의미할 수 있다.If the sentence contains a plurality of quotations, the third recognition method and the fourth recognition method extract speech-of-speech information from the sentence using the morpheme analyzer, and recognize the citation and the speaker of the document 108 based on the speech- can do. At this time, the third recognition method refers to tagging the speaker with respect to the quotation based on a predetermined sentence pattern indicating the relationship between the speaker and the quotation. The fourth recognition method includes the question keyword included in the sentence, It may mean tagging a speaker for a quotation based on a predetermined sentence pattern indicating a relationship with the quotation.

일례로, 인용문/화자 인식부(101)는 상기 언급한 4가지의 인식 방법을 조합하여 문장으로부터 인용문 및 화자를 인식할 수 있다. 또 다른 일례로, 인용문/화자 인식부(101)는 미리 설정된 인용문 및 화자를 인식하는 인식 방법을 적용하여 문장으로부터 인용문 및 화자를 인식하고, 인식 방법의 오류 정도를 고려하여 오류 정도가 가장 낮은 인식 방법에 따라 인용문 및 화자를 선택할 수 있다.For example, the quotation / speaker recognition unit 101 can recognize the citation and the speaker from the sentence by combining the above-mentioned four recognition methods. As another example, the quotation / speaker recognition unit 101 recognizes a quotation and a speaker from a sentence by applying a preset citation and a recognition method for recognizing the speaker, and recognizes the citation and the speaker from the sentence, You can select a citation and a speaker according to the method.

도 2 내지 도 10에서 제1 인식 방법 내지 제4 인식 방법에 대해 구체적으로 설명할 수 있다.In FIGS. 2 to 10, the first to fourth recognition methods can be specifically described.

일례로, 인용문/화자 인식부(101)는 인용문 인식부(104), 화자 후보 결정부(105), 화자 선택부(106) 및 화자 인식부(107)를 포함할 수 있다.For example, the citation / speaker recognition unit 101 may include a citation recognition unit 104, a speaker candidate determination unit 105, a speaker selection unit 106, and a speaker recognition unit 107.

인용문 인식부(104)는 문서(108)로부터 문장을 추출하여 문장에서 인용문을 인식할 수 있다. 문서(108)는 복수의 문장으로 구성될 수 있으며, 복수의 문장 각각은 적어도 하나의 화자와 인용문으로 구성될 수 있다. 일례로, 인용문 인식부(104)는 문장을 분할(tokenize)하여 문장으로부터 인용문을 추출함으로써 인용문을 인식할 수 있다.The citation recognizing unit 104 can extract a sentence from the document 108 and recognize the citation in the sentence. The document 108 may be composed of a plurality of sentences, and each of the plurality of sentences may be composed of at least one speaker and a quotation. For example, the citation recognizing unit 104 can recognize a citation by extracting a citation from a sentence by tokenizing the sentence.

일례로, 상기 언급한 제1 인식 방법 내지 제4 인식 방법을 적용하면, 인용문 인식부(104)는 문장에서 인용부호가 시작하는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다.For example, when the above-mentioned first to fourth recognition methods are applied, the citation recognizing unit 104 can recognize a phrase from a point at which a quotation mark starts to a point at which a quotation mark ends, as a quotation have.

화자 후보 결정부(105)는 문장을 분석하여 인용문의 화자 후보를 결정할 수 있다. 다시 말해서, 하나의 문장에 복수의 인용문과 복수의 화자가 존재할 수 있기 때문에, 복수의 화자 중 인용문을 발화한 화자 후보를 결정할 수 있다.The speaker candidate determination unit 105 can analyze the sentence and determine the speaker candidate of the quotation. In other words, since there can be a plurality of quotations and a plurality of speakers in one sentence, it is possible to determine a speaker candidate that has uttered a quotation from among a plurality of speakers.

일례로, 상기 언급한 제1 인식 방법 내지 제4 인식 방법을 적용하면, 화자 후보 결정부(105)는 문장에서 주격 조사의 앞부분에 위치한 명사구를 화자 후보로 결정할 수 있다.For example, when the above-mentioned first to fourth recognition methods are applied, the speaker candidate determining unit 105 can determine a phrase as a speaker candidate in the sentence.

화자 선택부(106)는 문장을 분석하여 화자 후보 중 인용문을 발화한 화자를 선택할 수 있다. 구체적으로, 화자 선택부(106)는 인용문의 문맥 분석을 통해 인용문과 화자의 관계를 결정할 수 있다.The speaker selection unit 106 can analyze a sentence and select a speaker who has uttered a quotation from among speaker candidates. Specifically, the speaker selection unit 106 can determine the relationship between the quotation and the speaker through the context analysis of the quotation.

화자 인식부(107)는 화자의 고유한 이름을 인식할 수 있다. 예를 들어, 화자가 A 회장인 경우 화자 인식부(107)는 A 회장의 원래 이름을 인식할 수 있다.The speaker recognition unit 107 can recognize a unique name of the speaker. For example, when the speaker is the A chairman, the speaker recognition unit 107 can recognize the original name of the A chairman.

화자 인식부(107)는 화자가 성을 나타내는 부분과 직함을 나타내는 부분으로 이루어진 경우에도 화자의 고유한 이름을 인식할 수 있다. 구체적으로, 뉴스 기사의 경우 "김 대통령"이라는 주어를 사용하기 전에, 해당 주어가 지칭하는 고유명사인 "김대중 대통령" 또는 "김영상 대통령"이 먼저 제시된다.The speaker identification unit 107 can recognize a unique name of the speaker even when the speaker is composed of a part indicating the gender and a part indicating the title. Specifically, in the case of news articles, "Kim Dae Jung President" or "Kim Young Sang President", which is a proper noun called by the subject, is first presented before using the subject of "President Kim".

본 발명의 일실시예에 따른 인용문/화자 인식부(101)는 문서 전체를 분석하여, 제일 처음 기재된 화자의 고유 이름을 저장할 수 있다. 그러면, 화자 인식부(107)는 특정 인용문에 대해 인식한 화자와 미리 저장된 화자의 고유 이름을 비교하고 매칭하여, 는 특정 인용문에 대해 인식한 화자가 "성+직위(직책)"으로 구성된 경우라고, 화자의 고유한 이름을 인식할 수 있다.The citation / speaker recognition unit 101 according to an embodiment of the present invention may analyze the entire document and store the first name of the first speaker. Then, the speaker recognition unit 107 compares and matches the distinguished name of the speaker recognized with respect to the specific quotation with the name of the speaker previously stored, so that if the speaker recognized for the specific quotation is composed of "sex + position (title) , The speaker's unique name can be recognized.

만약, 화자의 고유한 이름이 선행 문장에 기재되어 있지 않거나 또는 복수의 유사한 이름이 등장하였을 경우, 화자 인식부(107)는 앞뒤 문장의 문맥을 분석하여 예외적인 처리를 수행할 수 있다.If the unique name of the speaker is not described in the preceding sentence or a plurality of similar names appear, the speaker recognition unit 107 may analyze the context of the preceding and following sentences and perform exceptional processing.

이와 같은 과정을 통해, 인용문/화자 인식부(101)는 문서에 포함된 문장으로부터 인용문 및 화자를 인식할 수 있다.Through this process, the quotation / speaker recognition unit 101 can recognize the quotation and the speaker from the sentences included in the document.

화자 태깅부(102)는 인식한 인용문에 대해 화자를 태깅(tagging)할 수 있다. 즉, 화자 태깅부(102)는 화자와 인용문을 서로 연결하여 하나의 셋트(109)로 구성함으로써, 인용문을 화자로 인덱싱 할 수 있다. 추후, 인물 검색을 통해 화자가 입력되면, 인용문/화자 인식 시스템(100)은 화자에 태깅된 적어도 하나의 인용문을 추출하여 검색 결과로 제공할 수 있다.The speaker tagging unit 102 can tag the speaker for the recognized quotation. That is, the speaker tagging unit 102 can index the citation as a speaker by constructing one set 109 by connecting the speaker and the quotation. If the speaker is input through the person search, the citation / speaker recognition system 100 may extract at least one citation tagged in the speaker and provide the extracted result as a search result.

일례로, 문장으로부터 인식한 인용문에 대해 화자를 태깅하는 과정은 실시간으로 진행되거나 또는 사용자의 화자 검색 전에 미리 진행되어 특정 DB에 저장될 수 있다.For example, the process of tagging a speaker with respect to a citation recognized from a sentence may be performed in real time, or may be stored in a specific DB before the user searches for a speaker.

검색 결과 제공부(103)는 화자에 대한 검색 요청을 수신하면 화자가 발화한 인용문을 검색하여 검색 결과로 제공할 수 있다. 예를 들어, 사용자로부터 화자의 성명이 입력되면, 검색 결과 제공부(103)는 해당 화자에 태깅된 적어도 하나의 인용문을 검색하여 사용자에게 검색 결과로 제공할 수 있다. 인용문은 서비스에 따라 다양한 형태로 사용자에게 제공될 수 있다.When the search result providing unit 103 receives the search request for the speaker, the search result providing unit 103 may search the quoted speech made by the speaker and provide the search result. For example, when the name of the speaker is input from the user, the search result providing unit 103 may search at least one citation tagged in the corresponding speaker and provide the search result to the user. Quotations can be provided to the user in various forms depending on the service.

일례로, 검색 결과 제공부(103)는 특정 화자가 발화한 인용문을 검색하여 사용자에게 제공할 수 있다. 그리고, 검색 결과 제공부(103)는 특정 검색 질의와 화자명을 함께 입력받아서, 화자가 발화한 인용문 중 검색 질의를 포함하는 인용문을 검색하여 제공할 수 있다. 또한, 검색 결과 제공부(103)는 특정 화자가 미리 설정한 기간 동안 인용문에 가장 많이 언급한 단어를 추출하여 제공할 수 있다.For example, the search result providing unit 103 may search for a quotation sent by a specific speaker and provide it to the user. Then, the search result providing unit 103 receives the specific search query and the name of the speaker together, and can search for and provide a quotation containing the search query among the quotations that the speaker has uttered. In addition, the search result providing unit 103 may extract and provide the words most frequently referred to in the quotation for a predetermined period of time by a specific speaker.

다른 예로, 검색 결과 제공부(103)는 화자가 전문가인 경우, 해당 전문가가 발화한 인용문과 관련된 데이터를 인용문과 함께 제공할 수 있다. 예를 들어, 화자가 증권 애널리스트인 경우, 화자가 증권 시세를 분석한 인용문과 인용문을 발화한 시점의 증권 지수를 함께 제공할 수 있다.As another example, if the speaker is an expert, the search result providing unit 103 may provide data related to the quotation sent by the expert together with the quotation. For example, if the speaker is a securities analyst, the speaker may provide a citation analyzing the stock quotes and a securities index at the time the quotation is uttered.

또 다른 예로, 검색 결과 제공부(103)는 특정 화자를 사용자로부터 입력받으면, 상기 화자와 연관된 인물이 발화한 인용문도 함께 검색하여 제공할 수 있다.As another example, when the search result providing unit 103 receives a specific speaker from the user, the search result providing unit 103 may search for and provide a quotation that the person associated with the speaker has uttered.

도 2는 본 발명의 일실시예에 따른 인용문/화자를 인식하는 제1 인식 방법을 도시한 도면이다.2 is a diagram illustrating a first recognition method for recognizing a quotation / speaker according to an embodiment of the present invention.

도 2는 오토마타를 통해 문장으로부터 인용문과 화자를 인식하는 과정을 나타낸다. 구체적으로, 도 2는 문장으로부터 인용문과 화자를 인식하기 전에 패턴 처리 사전을 구축하여 텍스트 토큰을 추출하는 과정을 나타낸다.Fig. 2 shows a process of recognizing a citation and a speaker from a sentence through an automata. Specifically, FIG. 2 shows a process of extracting a text token by constructing a pattern processing dictionary before recognizing a citation and a speaker from the sentence.

입력된 문장이 도 2에 도시된 각각의 상태(202~207)를 거치면, 문장으로부터 복수의 텍스트 토큰이 추출될 수 있다. 도 2의 표는 패턴 처리 사전을 의미할 수 있다. 표에서 볼 수 있듯이, 패턴 처리 사전은 문장을 구성하는 단어들로 이루어진 "구(pharase)" 단위를 의미할 수 있다. 도 2에서 볼 수 있듯이, AS(202)는 주어구 후, BQ(203)는 인용구 전, Q(204)는 인용구, AQ(205)는 인용구 후, BV(206)는 동사구 전, AV(207)는 동사구 후를 나타낸다. 그리고, S(201)는 시작, F(208)은 종료를 의미할 수 있다. 즉, 인용문/화자 인식 시스템은 패턴 처리 사전을 통해 문장으로부터 텍스트 토큰을 추출하고, 추출된 텍스트 토큰을 이용하여 인용문 및 화자를 인식할 수 있다.If the inputted sentence passes through each of the states 202 to 207 shown in FIG. 2, a plurality of text tokens can be extracted from the sentence. The table in Fig. 2 may mean a pattern processing dictionary. As can be seen in the table, the pattern processing dictionary can mean a "pharase" unit consisting of words constituting a sentence. As shown in FIG. 2, after the AS 202 is given a quotation, the BQ 203 is a quote, Q 204 is a quotation, AQ 205 is a quote, BV 206 is a verb phrase, ) Represents the end of the verb phrase. Then, S (201) may indicate the start and F (208) may indicate the end. That is, the citation / speaker recognition system extracts the text token from the sentence through the pattern processing dictionary, and recognizes the citation and the speaker using the extracted text token.

문장이 S(201)에서부터 F(208)까지의 과정에 포함된 각 상태에 진입하면서 문장으로부터 인용문 및 화자가 인식될 수 있다. 인용문/화자 인식 시스템은 주격 조사(은/는/이/가) 다음에 문장이 AS(202)에 진입한 것으로 파악할 수 있다. 그리고, AS(202) 이후에 인용부호(" 또는 ') 가 시작되면, 문장은 Q(204)로 진입할 수 있다. 또한, AS(202) 이후에 인용부호(" 또는 ') 가 시작되지 않으면, 문장은 BQ(203)로 진입할 수 있다. 도 2를 참고하면, BG(203) 상태가 계속해서 반복될 수 있다. 그리고, BQ(203) 상태 이후에 인용부호가 시작되면, 문장은 Q(204)로 진입할 수 있다.The citation and the speaker can be recognized from the sentence while the sentence enters each state included in the process from S (201) to F (208). The quotation / speaker recognition system may determine that the sentence has entered the AS 202 after the sentence examination (/ / / / / /). Then, if quotation marks ("or") begin after AS 202, the statement may enter Q 204. Also, if quotation marks ("or") do not start after AS 202 , The sentence can enter the BQ 203. Referring to FIG. 2, the BG 203 state can be repeated continuously. Then, if the quotation marks start after the BQ 203 state, the sentence can enter Q 204.

Q(204) 이후에 인용부호가 끝나면, 문장은 AQ(205)로 진입할 수 있다. 그리고, AQ(205) 이후 인용격 조사(고/며/라고/라며)가 나타나면, 문장은 BV(206)로 진입할 수 있다. 한편, 문장은 AQ(204) 이후에 다시 BQ(205)로 진입할 수 있다.When Q (204) is followed by a quote, the sentence can enter the AQ (205). Then, if the quotation check (say / / and / /) is displayed after the AQ 205, the sentence can enter the BV 206. On the other hand, the sentence may enter the BQ 205 again after the AQ 204.

BV(206) 이후에 새로운 인용부호가 시작되면, 문장은 Q(204)로 진입할 수 있다. 그리고, BV(206) 이후에 동사(~했다)가 나타나면, 문장은 AV(207)로 진입할 수 있다. 문장에 마침표가 나타나면, 문장은 F(208)로 진입하여 과정이 종료될 수 있다.If a new quote begins after BV 206, the sentence can enter Q 204. If the verb (~) appears after the BV 206, the sentence can enter the AV 207. If a period appears in the sentence, the sentence may enter F (208) and the process may end.

도 3은 본 발명의 일실시예에 따라 문장에 제1 인식 방법을 적용한 일례를 도시한 도면이다. 도 3에서는 다음과 같은 문장에 제1 인식 방법을 적용할 수 있다. FIG. 3 is a diagram illustrating an example in which a first recognition method is applied to a sentence according to an embodiment of the present invention. In FIG. 3, the first recognition method can be applied to the following sentence.

『홍 의원은 이날 BBS 라디오 '김재원의 아침 저널'에 출연해 19세기 독일 비스마르크와 라 살레의 협력을 예로 들며 "박 대표가 애국심이 있으니 공유할 분야가 있다"라고 말했다.』"Hong has appeared on BBS Radio 'morning journal of Kim Jae Won' on this day, taking as an example the cooperation of Bismarck and La Salle in Germany in the nineteenth century." Park has patriotism and there is a field to share. "

위에 언급한 문장을 제1 인식 방법에 적용하면, 문장은 시작(201)을 거쳐 '홍 의원은'에서 주격 조사인 '은' 이후에 주어구 후 상태인 AS(202)로 진입할 수 있다. 즉, '홍 의원'은 주어를 의미할 수 있다. 그리고, AS(202)이후에 바로 인용부호가 시작되지 않기 때문에 문장은 인용구 전 상태인 BQ(203)으로 진입할 수 있다. 이후에, 인용부호(")가 시작되면, 문장은 인용구 상태인 Q(204)로 진입할 수 있다. 그 다음, 인용부호가 끝나면, 문장은 인용구 후 상태인 AQ(205)로 진입할 수 있다. 이 때, 인용구는 인용부호가 시작하는 지점에서 종료하는 지점까지의 문구인 "박 대표가 애국심이 있으니 공유할 분야가 있다"를 의미할 수 있다.If the above-mentioned sentence is applied to the first recognition method, the sentence can enter the AS 202, which is in a state after the start of the sentence '201', after the ' In other words, 'Hong-won' can mean subject. Since the quotation mark does not start immediately after the AS 202, the sentence can enter the BQ 203, which is a state before the quotation. Thereafter, when the quotation mark (") is started, the sentence can enter Q (204), which is a quote state. Then, at the end of the quotation, the sentence can enter the quoted state AQ 205 . At this time, the quotation may mean "there is a field to share because patriotism is patriotic", which is the phrase from the point where the quotation marks start to the point where it ends.

그리고, 인용구 뒤에 인용격 조사인 '고'가 나타나므로, 문장은 동사구 전 상태인 BV(206)로 진입할 수 있다. 그 이후에 동사구인 '말했다'가 지나면, 문장은 동사구 후 상태인 AV(207)로 진입할 수 있다. 동사구 후에 마침표가 나타나므로, 문장은 종료(208)될 수 있다.Then, since a quotation check "high" appears after the quotation, the sentence can enter the BV 206, which is a state before the verb phrase. After that, if the verb phrase 'say' is passed, the sentence can enter AV (207), which is the state after the verb phrase. Since a period appears after the verb phrase, the sentence can be terminated (208).

이 때, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작되는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다. 상기 문장을 예로 들면, "박 대표가 애국심이 있으니 공유할 분야가 있다"라는 인용구가 인용문으로 인식될 수 있다. At this time, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark is started to the point where the quotation mark ends, as a quotation. Taking the above sentence as an example, a quotation such as "Park has patriotism and there is a field to share" can be recognized as a quotation.

그리고, 인용문/화자 인식 시스템은 문장에서 주격 조사의 앞부분에 위치한 명사구를 화자 후보로 결정할 수 있다. 상기 문장을 예로 들면, "홍 의원"이 인용문의 화자 후보로 결정될 수 있다. 그리고, 인용문/화자 인식 시스템은 화자 후보로 결정된 '홍 의원'을 홍xx라는 고유한 이름으로 인식함으로써 인용문에 대한 화자를 인식할 수 있다.In addition, the quotation / speaker recognition system can determine a noun phrase located at the beginning of the phrase investigation as a speaker candidate in a sentence. Taking the above sentence as an example, the "Mr. Hong" can be determined as the speaker candidate of the quotation. And, the quotation / speaker recognition system recognizes the speaker for the quotation by recognizing the 'Hong doctor' determined by the speaker candidate as the unique name of Hong xx.

도 4는 본 발명의 일실시예에 따른 인용문/화자를 인식하는 제2 인식 방법을 도시한 도면이다.4 is a diagram illustrating a second recognition method for recognizing a quotation / speaker according to an embodiment of the present invention.

도 4는 오토마타를 통해 문장으로부터 인용문과 화자를 인식하는 과정을 나타낸다. 구체적으로, 도 4는 형태소 분석기를 부분 파서(partial parser)로 사용 하여 문장으로부터 품사 정보를 추출하는 과정을 나타낸다. 인용문/화자 인식 시스템은 추출된 품사 정보에 기초하여 문서의 인용문 및 화자를 인식할 수 있다.FIG. 4 shows a process of recognizing a citation and a speaker from a sentence through an automata. Specifically, FIG. 4 shows a process of extracting part of speech information from a sentence using a morpheme analyzer as a partial parser. The quotation / speaker recognition system can recognize the quotation and the speaker of the document based on the extracted part-of-speech information.

입력된 문장이 도 4에 도시된 각각의 상태(402~407)를 거치면, 문장으로부터 품사 정보가 추출될 수 있다. 도 4의 표는 형태소 분석기를 의미할 수 있다. 형태소 분석기를 참조하면, 문장에서 명사구, 인용문, 어미, 동사는 저장되고 나머지는 패스(pass)되는 것을 알 수 있다.If the inputted sentence passes through each of the states 402 to 407 shown in FIG. 4, parts of speech information can be extracted from the sentence. The table in FIG. 4 may refer to a morpheme analyzer. Referring to the morpheme analyzer, it can be seen that noun phrases, quotation marks, endings, and verbs are stored in the sentence and the rest are passed.

도 4를 참고하면, N(402)는 명사구, R(403)은 기타, Q(404)는 인용문, C(405)는 인용구, E(406)는 어미, V(407)는 동사를 나타낸다. 그리고, S(401)는 시작, F(408)은 종료를 의미할 수 있다. 이 때, 명사구는 접미사, 접두사, 명사, 한글 숫자, 관형사, 1음절 명사, 대명사, 고유명사, 영문 또는 숫자 등을 포함할 수 있다.Referring to FIG. 4, N (402) is a noun phrase, R (403) is a guitar, Q (404) is a quotation, C (405) is a quote, E (406) is an end and V (407) is a verb. Then, S (401) may indicate the start and F (408) may indicate the end. In this case, the noun phrase may include a suffix, a prefix, a noun, a Hangul numeral, an adjective, a syllable noun, a pronoun, a proper noun, an alphabet or a number.

문장은 시작(401)을 거쳐 N(402)로 진입할 수 있다. 그리고, 주격 조사(은/는/이/가)가 나타나면, 문장은 R(403)으로 진입할 수 있다. 그 이후에 인용부호(")가 시작되면, 문장은 Q(404)로 진입하고, 인용부호가 종료되면 문장은 C(405)로 진입할 수 있다. 인용부호가 종료되는 지점에 인용격 조사(고/며/라고/라며)가 나타나면, 문장은 R(403)으로 진입할 수 있다. 이후, 다른 인용부호가 시작되면, 문장은 또 다시 Q(404)로 진입할 수 있다.The sentence may enter N 402 via start 401. Then, when the nomination survey (E / / / / /) is displayed, the sentence can enter R (403). After that, if the quotation mark (") is started, the sentence enters Q (404), and if the quotation mark is ended, the sentence can enter C (405) ), The sentence can enter R (403). After that, if another quote begins, the sentence can again enter Q (404).

R(403) 상태 이후에 명사(NOUN)가 나타나면 문장은 E(406)으로 진입하고, R(403) 이후에 동사(VERB)가 나타나면 문장은 V(407)로 진입할 수 있다. V(407)로 진입한 후 명사가 나타나면, 문장은 다시 R(403)으로 진입할 수 있다. V(407) 이 후에 마침표가 나타나면, 문장은 F(408)로 진입할 수 있다. 결국, 도 4에 도시된 과정을 통해 문장으로부터 복수의 품사 정보가 추출될 수 있다.If NOUN appears after R (403), the sentence enters E (406), and if VERB appears after R (403), the sentence can enter V (407). If the noun appears after entering V (407), the sentence can again enter R (403). If V (407) appears followed by a period, the sentence can enter F (408). As a result, a plurality of parts of speech information can be extracted from the sentence through the process shown in FIG.

그러면, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다. 또한, 인용문/화자 인식 시스템은 문장에서 주격 조사 앞부분에 위치한 명사구를 화자 후보로 결정할 수 있다.Then, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark starts to the point where the quotation mark ends, as a quotation. In addition, the quotation / speaker recognition system can determine a noun phrase located in the front part of the sentence in the sentence as a speaker candidate.

도 5는 본 발명의 일실시예에 따라 문장에 제2 인식 방법을 적용한 일례를 도시한 도면이다.5 is a diagram illustrating an example in which a second recognition method is applied to a sentence according to an embodiment of the present invention.

도 5은 본 발명의 일실시예에 따라 문장에 제2 인식 방법을 적용한 일례를 도시한 도면이다. 도 5에서는 다음과 같은 문장에 제2 인식 방법을 적용할 수 있다. 5 is a diagram illustrating an example of applying a second recognition method to a sentence according to an embodiment of the present invention. In FIG. 5, the second recognition method can be applied to the following sentence.

『그러면서 권 의원은 "지금 이제 한치 앞도 내다 볼 수 없는 남북 관계가 ‥ 보고 있다"며, "한반도에서의 평화는 ‥ 남북 관계는 풀어야 하는 것"이라고 강조했다.』"In the meantime, Kwon said," Now, I can see an inter-Korean relationship that I can not even see before. "" Peace on the Korean peninsula should be solved. "

위에 언급한 문장을 제2 인식 방법에 적용하면, 문장은 시작(401)을 거쳐, '권 의원'이 나타나면 명사구로 인식되어 N(402)으로 진입할 수 있다. 그리고, 주격 조사인 '은'을 지나면 문장은 R(403)로 진입할 수 있다. 이 후, 인용부호가 시작되면, 문장은 Q(404)로 진입할 수 있다.If the above sentence is applied to the second recognition method, the sentence is recognized as a noun phrase through the start (401), and when the 'noun member' appears, it can enter N (402). Then, after passing through the 'N', the sentence can enter R (403). Thereafter, when the quotation marks begin, the sentence can enter Q (404).

그리고, 인용부호가 종료되면 문장은 C(405)로 진입하며, 그 이후 인용격 조사인 '며'가 나타나면 문장은 R(403)로 진입할 수 있다. 그 이후, 또 다시 새로운 인용부호가 시작되면, 문장은 Q(404)로 진입하며, 인용부호가 종료되면, 문장은 C(405)로 진입할 수 있다. 결국, 도 4의 문장은 "지금 이제 ~ 보고 있다"라는 인용구와 "한반도에서의 ~ 풀어야 하는 것"이라는 인용구로 구성될 수 있다.Then, when the quotation mark ends, the sentence enters C (405), and after that, if the quotation check is performed, the sentence can enter R (403). Thereafter, once again a new quote begins, the sentence enters Q 404, and when the quote ends, the sentence can enter C (405). Finally, the sentence in Fig. 4 can consist of a quotation such as "now seeing now" and "a thing to be solved on the Korean peninsula".

그 이후에 인용격 조사인 '이라고'가 나타나면, 문장은 R(403)로 진입하며, 이후 명사구인 '강조'와 동사구인 '했다'가 나타나면, 문장은 각각 E(406)와 V(407)으로 진입할 수 있다. 마지막으로 마침표가 나타나면, 문장은 F(408)를 거쳐 종료할 수 있다. 결국, 이러한 과정을 통해 문장의 품사 정보가 추출될 수 있다.(406) and V (407), the sentence becomes "R" (403), and the sentence becomes " . &Lt; / RTI > Finally, if a period appears, the sentence can be terminated via F (408). As a result, parts of speech information of a sentence can be extracted through this process.

그러면, 인용문/화자 인식 시스템은 품사 정보에 기초하여 문장에서 인용문 및 화자를 인식할 수 있다. 일례로, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다. 상기 문장에 따르면, 문서에서 "지금 이제 ~ 보고 있다"라는 인용구와 "한반도에서의 ~ 풀어야 하는 것"이라는 인용구가 인용문으로 인식될 수 있다.Then, the quotation / speaker recognition system can recognize the quotation and the speaker in the sentence based on the part-of-speech information. For example, the quotation / speaker recognition system can recognize a phrase from the point where the quotation mark starts to the point where the quotation mark ends, as a quotation. According to the sentence above, quotations such as "now looking at" and "what should be solved on the Korean peninsula" in the document can be recognized as quotes.

그리고, 인용문/화자 인식 시스템은 문장에서 주격 조사 앞부분에 위치한 명사구가 화자 후보로 결정할 수 있다. 상기 문장에 따르면, 문서에서 '권 의원'이 화자 후보로 결정될 수 있다. 그러면, 인용문/화자 인식 시스템은 '권 의원'의 고유한 이름인 권xx를 인식할 수 있다. 도 5의 예에 따르면, 권xx가 2개의 인용문을 발화한 것으로 파악할 수 있으며, 2개의 인용문은 권xx라는 화자로 태깅될 수 있다.And, in the quotation / speaker recognition system, a noun phrase located at the beginning of the narrative search in the sentence can be determined as the speaker candidate. According to the above sentence, the 'lawmaker' in the document can be determined as the speaker candidate. Then, the quotation / speaker recognition system can recognize the unique name xx, which is the name of the 'member'. According to the example of FIG. 5, it can be seen that the volume xx is the speech of two quotations, and the two quotations can be tagged as the volume xx.

도 6은 본 발명의 일실시예에 따른 인용문/화자를 인식하는 제3 인식 방법을 도시한 도면이다.6 is a diagram illustrating a third recognition method for recognizing a quotation / speaker according to an embodiment of the present invention.

도 6은 로직 다이어그램을 통해 문장으로부터 인용문과 화자를 인식하는 과정을 나타낸다. 특히, 도 6은 복수의 인용문과 복수의 화자가 문장에 존재할 때 의미를 가질 수 있다. 구체적으로, 인용문/화자 인식 시스템은 형태소 분석기를 문장으로부터 품사 정보를 추출하고, 품사 정보에 기초하여 문서의 인용문 및 화자를 인식할 수 있다. 그러면, 인용문/화자 인식 시스템은 화자와 인용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅할 수 있다.Figure 6 shows the process of recognizing a citation and a speaker from a sentence through a logic diagram. In particular, FIG. 6 can have meaning when a plurality of quotations and a plurality of speakers exist in a sentence. Specifically, the quotation / speaker recognition system extracts parts-of-speech information from a sentence of the morpheme analyzer, and recognizes a quotation and a speaker of the document based on parts-of-speech information. Then, the quotation / speaker recognition system can tag the speaker for the quotation based on a predetermined sentence pattern indicating the relationship between the speaker and the quotation.

입력된 문장이 도 6에 도시된 각각의 상태(602~604)를 거치면, 문장으로부터 품사 정보가 추출될 수 있다. 도 6의 표는 형태소 분석기를 의미할 수 있다. 형태소 분석기를 참조하면, 문장에서 명사구, 인용문은 저장되고 나머지는 패스(pass)되는 것을 알 수 있다.If the inputted sentence passes through each of the states 602 to 604 shown in Fig. 6, parts of speech information can be extracted from the sentence. The table of FIG. 6 may refer to a morpheme analyzer. Referring to the morpheme analyzer, it can be seen that noun phrases and quotations are stored in the sentence and the rest are passed.

문장이 시작(601)을 거치면, 문장으로부터 명사구가 추출되어 명사구의 품사 정보가 판단된 후에 문장은 N(602)으로 진입할 수 있다. 또한, 문장이 시작(601)을 거친 후에 바로 인용부호가 시작되면, 문장은 Q(604)로 진입할 수 있다.If the sentence starts at the beginning (601), the sentence can enter N (602) after the noun phrase is extracted from the sentence and the parts of speech information of the noun phrase is determined. Also, if the quotation marks begin immediately after the beginning of the sentence 601, the sentence can enter Q (604).

그리고, N(602) 상태를 유지하다가 주격 조사(은/는/이/가)가 나타나면, 문장은 R(603)으로 진입할 수 있다. 이 때, R(603)에서 마침표가 나타나면, 문장은 F(605)로 진입하여 종료할 수 있다. 그리고, R(603) 이후에 인용부호(")가 시작되면, 문장은 Q(604)로 진입함으로써 인용구가 추출될 수 있다.Then, if the N (602) state is maintained and the noun examination ((/, / / / /)) appears, the sentence can enter R (603). At this time, if a period is displayed in R (603), the sentence can be terminated by entering F (605). Then, when the quotation mark (") is started after R (603), the sentence can be extracted by entering Q (604).

이 때, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다. 그리고, 인용문/화자 인식 시스템은 문장에서 미리 설정한 명사구와 주격 조사가 결합된 형태를 화자 후보로 결정할 수 있다.At this time, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark starts to the point where the quotation mark ends, as a quotation. In addition, the quotation / speaker recognition system can determine the form in which the nominal phrase set in the sentence is combined with the nominative examination as the speaker candidate.

결국, 도 6에 따르면, 반복적인 과정을 통해 문장에 포함된 복수의 주어와 인용구가 추출될 수 있다. 추출된 주어와 인용구는 각각 화자와 인용문으로 인식될 수 있다. 복수의 화자와 인용문이 인식되는 경우, 화자가 어떤 인용문을 발화했는 지를 파악하는 것이 중요하다. 이러한 과정은 미리 설정된 문장 패턴을 통해 화자와 인용문의 관계가 파악될 수 있다. 문장 패턴의 일례는 도 8에서 설명될 수 있다.As a result, according to FIG. 6, a plurality of subjects and quotations included in a sentence can be extracted through an iterative process. The extracted subject and quotation can be recognized as a speaker and a quotation, respectively. If multiple speakers and quotations are recognized, it is important to know which citation the speaker has uttered. In this process, the relationship between the speaker and the quotation can be grasped through a predetermined sentence pattern. An example of a sentence pattern can be described in Fig.

도 7은 본 발명의 일실시예에 따라 문장에 제3 인식 방법을 적용한 일례를 도시한 도면이다. 도 7에서는 다음과 같은 문장에 제3 인식 방법을 적용하여 화자 및 인용문을 인식하고, 인용문에 대해 화자를 태깅하는 과정을 나타낸다. 7 is a diagram illustrating an example in which a third recognition method is applied to a sentence according to an embodiment of the present invention. FIG. 7 shows a process of recognizing a speaker and a quotation by applying a third recognition method to the following sentence, and tagging a speaker with respect to the quotation.

『정세균 민주당 대표는 8일 국회 문화체육관광방송통신위원회 위원장인 고흥길 한나라당 의원이 "언론관계법을 2월 임시 국회에 상정하겠다"고 밝힌 것에 대해 "당연히 상정에 대해 합의가 돼야 상정할 수 있는 것"이라고 밝혔다.』"Democratic Party chairman Chung Sye-kyun said on January 8 that Gohung-gil, the chairman of the National Assembly Culture, Sports, Tourism and Broadcasting Communications Commission, said that the Grand National Party lawmaker" will introduce the media relations law to the extraordinary session of the National Assembly in February. "" "

위에 언급한 문장을 제3 인식 방법에 적용하면, 문장은 시작(601)을 거쳐, '정세균 민주당 대표'가 나타나면 명사구로 인식되어 N(602)으로 진입할 수 있다. 그리고, 주격 조사인 '은'을 지나면 문장은 R(603)로 진입할 수 있다. 다시, 주어인 명사구가 나타나면 문장은 N(602)로 진입할 수 있다. 그리고, 주격 조사인 '이'를 지나면 문장은 R(603)으로 진입할 수 있다. 이 때, R(603)을 유지하다가 인 용부호가 시작되면, 문장은 Q(604)에 진입하여 인용부호가 종료될 때까지 Q(604)를 유지할 수 있다. 그런 후, 인용부호가 종료되면, 문장은 R(603)으로 진입할 수 있다. 그리고, 다시 인용부호가 시작되면, 문장은 Q(604)로 진입할 수 있다. 인용부호가 종료되면, 문장은 R(603)으로 진입할 수 있다. 그 이후에 마침표가 나타나면, 문장은 F(605)를 거쳐 종료할 수 있다.If we apply the above sentence to the third recognition method, the sentence will be recognized as a noun phrase after the start (601) and if the 'democrat representative' appears, it can enter N (602). Then, after passing the noun clause '', the sentence can enter R (603). Again, if the subject noun phrase appears, the sentence can enter N (602). Then, if the sentence exceeds' ',' the sentence can enter R (603). At this time, if R (603) is maintained and the quotation mark is started, the sentence can keep Q (604) until Q (604) is reached and the quote ends. Then, when the quotation marks are terminated, the sentence can enter R (603). And, once the quote begins again, the sentence can enter Q (604). When the quotation marks are terminated, the sentence can enter R (603). If a period appears after that, the sentence can be terminated via F (605).

이러한 과정을 거치면, 문장의 품사 정보가 추출되고, 인용문/화자 인식 시스템은 주격 조사 앞부분에 위치한 명사구를 화자 후보로 결정하고, 인용부호 사이의 인용구를 인용문으로 인식할 수 있다. 상기 문장에 따르면, '정세균 민주당 대표'과 '고흥길 한나라당 의원'가 화자 후보로 결정될 수 있으며, "언론관계법을 ~ 상정하겠다"와 "당연히 ~ 상정할 수 있는 것"이 인용문으로 인식될 수 있다. 이 때, 화자는 화자 후보의 고유한 이름인 정세균과 고흥길로 인식될 수 있다.In this way, the part of speech information of the sentence is extracted, and the quotation / speaker recognition system can determine the phrase as a speaker candidate located at the beginning of the frequency search and recognize the quotation between quotation marks as a quotation. According to the above sentence, 'Jeong Seok-gyun's Democratic Representative' and 'Gohung-gil's GN Grand Assembly' can be determined as speaker candidates, and 'I will assume the media relations law' and 'what can be assumed ~' can be recognized as quotes. At this time, the speaker can be recognized as Jeongseekun and Goheunggil, which are unique names of the speaker candidate.

다만, 상기 문장의 경우 복수의 인용문과 화자가 인식되었기 때문에, 화자가 어떤 인용문을 발화했는 지 결정하는 화자 태깅 과정이 중요하다. 인간은 문맥 구조를 통해 인용문에 대해 화자를 용이하게 태깅할 수 있지만, 시스템은 이러한 화장 태깅 과정을 파악하기 힘들다. 본 발명의 일실시예에 따르면, 인용문/화자 인식 시스템은 많이 사용되는 문장 패턴을 미리 정의함으로써 문장 패턴에 따라 화자와 인용문의 관계를 파악할 수 있다. 문장 패턴은 도 8에서 구체적으로 설명될 수 있다.However, in the case of the sentence, since a plurality of quotations and speakers are recognized, it is important that the speaker tagging process determines which citation the utterer has uttered. Although humans can easily tag a speaker on a citation through a contextual structure, the system is not able to grasp this makeup tagging process. According to an embodiment of the present invention, the citation / speaker recognition system can preliminarily define a frequently used sentence pattern, thereby grasping the relationship between the speaker and the citation according to the sentence pattern. The sentence pattern can be explained in detail in Fig.

도 8은 본 발명의 일실시예에 따른 화자와 인용문의 문장 패턴에 따라 화자를 태깅하는 일례를 도시한 테이블이다.8 is a table showing an example of tagging a speaker according to a sentence pattern of a speaker and a quotation according to an embodiment of the present invention.

도 8을 참고하면, 문장으로부터 인식된 화자와 인용문이 복수 개 존재하는 경우에 문장 패턴에 따라 화자와 인용문과의 관계를 파악할 수 있다. 도 8에서 N은 화자이고, Q는 인용문을 의미한다.Referring to FIG. 8, when there are a plurality of recognized speakers and quotations from the sentence, the relationship between the speaker and the quotation can be grasped according to the sentence patterns. 8, N is a speaker and Q is a quotation.

문장에서 인식한 화자와 인용문이 각각 하나인 N1-Q1이면, 인용문 Q1은 화자 N1이 발화한 것으로 태깅될 수 있다. 그리고, N1-Q1-Q2와 같은 문장 패턴인 경우, 인용문 Q1과 인용문 Q2는 각각 화자 N1이 발화한 것으로 태깅될 수 있다.If each of the recognized speakers and the quotation is N1-Q1, then the quotation Q1 can be tagged as the speaker N1 uttered. In the case of a sentence pattern such as N1-Q1-Q2, the quotation Q1 and the quotation Q2 can be tagged as a speech of the speaker N1, respectively.

반면, N1-Q1-N2-Q2와 같은 문장 패턴인 경우, 인용문 Q1은 화자 N1이 발화하였고, 인용문 Q2는 화자 N2가 발화한 것으로 태깅될 수 있다. 또한, N1-N2-Q1-Q2와 같은 문장 패턴인 경우, 인용문 Q1은 화자 N2가 발화하였고, 인용문 Q2는 화자 N1이 발화한 것으로 태깅될 수 있다. 또한, Q1, Q1-Q2와 같이 화자가 없는 경우, 인용문에 대해 화자가 태깅되지 않을 수 있다.On the other hand, in the case of a sentence pattern such as N1-Q1-N2-Q2, the quotation Q1 is spoken by the speaker N1, and the quotation Q2 is tagged as spoken by the speaker N2. Also, in the case of a sentence pattern such as N1-N2-Q1-Q2, the quotation Q1 may be tagged as a speaker N2, and the quotation Q2 may be tagged as a speaker N1. Also, in the case where there is no speaker such as Q1 and Q1-Q2, the speaker may not be tagged with respect to the quotation.

도 7에서 설명한 문장을 도 8에 적용하면, 상기 문장은 N1-N2-Q1-Q2와 같은 문장 패턴을 나타내기 때문에, 인용문인 "언론관계법을 ~ 상정하겠다"는 화자인 고흥길이 발화한 것으로 태깅되고, 인용문인 "당연히 ~ 상정할 수 있는 것"은 화자인 정세균이 발화한 것으로 태깅될 수 있다.8, the above sentence represents a sentence pattern such as N1-N2-Q1-Q2, so it is the utterance of the phrase "I will assume the press relations law" , And the quotation "of course, what can be assumed" can be tagged as the utterance of the speaker, Jung Sye-kyun.

도 8에 도시된 예는 문장 패턴에 따른 화자와 인용문의 관계를 나타내는 일례에 불과하고, 본 발명의 일실시예에 따른 인용문/화자 인식 시스템은 도 8의 예에 한정되지 않는다.The example shown in FIG. 8 is merely an example of a relationship between a speaker and a citation according to a sentence pattern, and the citation / speaker recognition system according to an embodiment of the present invention is not limited to the example of FIG.

도 9는 본 발명의 일실시예에 따른 인용문/화자를 인식하는 제4 인식 방법을 도시한 도면이다.9 is a diagram illustrating a fourth recognition method for recognizing a quotation / speaker according to an embodiment of the present invention.

도 9은 로직 다이어그램을 통해 문장으로부터 인용문과 화자를 인식하는 과정을 나타낸다. 특히, 도 9은 복수의 인용문과 복수의 화자가 문장에 존재하고, 문장에 질문 키워드가 포함된 경우 의미를 가질 수 있다. 즉, 제4 인식 방법은 제3 인식 방법 중 예외 상황(문장에 질문, 물음 등의 질문 키워드가 포함된 경우)에 관한 것이라고 할 수 있다.Figure 9 shows the process of recognizing a citation and a speaker from a sentence through a logic diagram. In particular, FIG. 9 can have a meaning when a plurality of quotations and a plurality of speakers exist in the sentence and the question keyword is included in the sentence. That is, the fourth recognition method relates to an exceptional situation (when a query keyword such as a question or a question is included in the sentence) among the third recognition method.

구체적으로, 인용문/화자 인식 시스템은 형태소 분석기를 문장으로부터 품사 정보를 추출하고, 품사 정보에 기초하여 문서의 인용문 및 화자를 인식할 수 있다. 그러면, 인용문/화자 인식 시스템은 문장에 포함된 질문 키워드와 화자와 인용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅할 수 있다. 이 때, 문장 패턴에 관한 설명은 도 8을 참고할 수 있다.Specifically, the quotation / speaker recognition system extracts parts-of-speech information from a sentence of the morpheme analyzer, and recognizes a quotation and a speaker of the document based on parts-of-speech information. Then, the quotation / speaker recognition system can tag the speaker with the query keyword based on the query keyword included in the sentence, the preset sentence pattern indicating the relationship between the speaker and the quotation, and the like. At this time, a description of the sentence pattern can be referred to Fig.

입력된 문장이 도 9에 도시된 각각의 상태(902~905)를 거치면, 문장으로부터 품사 정보가 추출될 수 있다. 도 9의 표는 형태소 분석기를 의미할 수 있다. 형태소 분석기를 참조하면, 문장에서 명사구, 인용문, 질문 키워드가 저장되고, 나머지는 패스되는 것을 알 수 있다.If the input sentence passes through each of the states 902 to 905 shown in Fig. 9, parts of speech information can be extracted from the sentence. The table in FIG. 9 may refer to a morpheme analyzer. Referring to the morpheme analyzer, it can be seen that noun phrases, quotations, and question keywords are stored in the sentence, and the rest are passed.

문장이 시작(901)을 거치면, 문장으로부터 명사구의 품사 정보가 판단된 후에 문장은 N(902)으로 진입할 수 있다. 또한, 문장이 시작(901)을 거친 후에 바로 인용부호가 시작되면, 문장은 Q(905)로 진입할 수 있다.If the sentence passes the start (901), the sentence can enter N (902) after the part of speech information of the phrase is determined from the sentence. Also, if a quote begins immediately after the sentence has gone through the start (901), the sentence can enter Q (905).

그리고, N(902) 상태를 유지하다가 주격 조사(은/는/이/가)가 나타나면, 문장은 R(903)으로 진입할 수 있다. 이 때, R(903)에서 마침표가 나타나면, 문장은 F(905)로 진입하여 종료할 수 있다. 그리고, R(903) 이후에 질문, 물음 등의 질문 키워드가 나타나면, 문장은 EX(904)로 진입할 수 있다. 그러면, 문장으로부터 질문 키워드가 추출되어 저장될 수 있다. 질문 키워드는 문장에서 반복적으로 추출될 수 있다. 또한, R(903) 이후에 인용부호(")가 시작되면, 문장은 Q(904)로 진입함으로써 인용구가 추출될 수 있다.Then, if the N (902) state is maintained and the noun examination ((/, / / / /)) appears, the sentence can enter R (903). At this time, if a period appears in R (903), the sentence can enter F (905) and terminate. Then, if a question keyword such as a question or a question appears after R (903), the sentence can enter EX (904). Then, the query keyword can be extracted from the sentence and stored. Question keywords can be extracted repeatedly in sentences. Further, if the quotation mark (") starts after R (903), the sentence can be extracted by entering Q (904).

이 때, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다. 그리고, 인용문/화자 인식 시스템은 문장에서 주격 조사 앞부분에 위치한 명사구를 화자 후보로 결정할 수 있다.At this time, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark starts to the point where the quotation mark ends, as a quotation. In addition, the quotation / speaker recognition system can determine a noun phrase located at the beginning of the phrase investigation as a speaker candidate in the sentence.

결국, 도 9에 따르면, 반복적인 과정을 통해 문장에 포함된 복수의 주어와 인용구가 추출될 수 있다. 특히, 문장에 질문 키워드가 포함된 경우, 질문 키워드가 추출될 수 있다. 추출된 주어와 인용구는 각각 화자와 인용문으로 인식될 수 있다. 복수의 화자와 인용문이 인식되는 경우, 화자가 어떤 인용문을 발화했는 지를 파악하는 것이 중요하다. 이러한 과정은 미리 설정된 문장 패턴을 통해 화자와 인용문의 관계가 파악될 수 있다. 문장 패턴의 일례는 도 8에서 설명될 수 있다. 다만, 문장 패턴에 따라 인식되는 인용문과 화자는 질문 키워드에 따라 변경될 수 있다.As a result, according to FIG. 9, a plurality of subjects and quotations included in a sentence can be extracted through an iterative process. In particular, if the sentence includes a question keyword, the question keyword can be extracted. The extracted subject and quotation can be recognized as a speaker and a quotation, respectively. If multiple speakers and quotations are recognized, it is important to know which citation the speaker has uttered. In this process, the relationship between the speaker and the quotation can be grasped through a predetermined sentence pattern. An example of a sentence pattern can be described in Fig. However, the citation and the speaker recognized according to the sentence pattern can be changed according to the question keyword.

도 10은 본 발명의 일실시예에 따라 문장에 제4 인식 방법을 적용한 일례를 도시한 도면이다.10 is a diagram illustrating an example in which a fourth recognition method is applied to a sentence according to an embodiment of the present invention.

도 10은 본 발명의 일실시예에 따라 문장에 제4 인식 방법을 적용한 일례를 도시한 도면이다. 도 10은 다음과 같은 문장에 제4 인식 방법을 적용하여 화자 및 인용문을 인식하고, 인용문에 대해 화자를 태깅하는 과정을 나타낸다. 10 is a diagram illustrating an example in which a fourth recognition method is applied to a sentence according to an embodiment of the present invention. FIG. 10 shows a process of recognizing a speaker and a quotation by applying a fourth recognition method to the following sentence, and tagging a speaker with respect to the quotation.

『민주당 원혜경 원내대표는 이날 CBS 라이오 '김현정의 뉴스쇼'에 출연해 "북한의 대남 강경 태도가 김 위원장의 와병중 생기는 일시적 현상으로 보는가"라는 질문에 "일시적인 것으로 봐야될지 상황이 악화될 때 어떻게 될 지 등은 신중하게 판단해야 한다"며 이같이 밝혔다.』"Democratic Party leader Won Hye-kyung will appear on the news show of Kim Hyun-jung, the CBS Liao, saying," Do you think North Korea's hard-line attitude toward South Korea is a temporary phenomenon during Kim's death? "" It should be judged carefully. "

위에 언급한 문장을 제4 인식 방법에 적용하면, 문장은 시작(901)을 거쳐, '민주당 원혜경 원내대표'가 나타나면 명사구로 인식되어 N(902)으로 진입할 수 있다. 그리고, 주격 조사인 '는'을 지나면 문장은 R(903)로 진입할 수 있다. 이후, R(903)를 유지하다가 인용부호가 시작되면 문장은 Q(904)에 진입하여 인용부호가 종료될 때까지 Q(904)를 유지할 수 있다. 그런 후, 인용부호가 종료되면, 문장은 R(903)으로 진입할 수 있다.When the above sentence is applied to the fourth recognition method, the sentence is recognized as a noun phrase when the 'Democratic Party Won Hye-kyung Representative' appears in the beginning (901), and can enter N (902). Then, after passing through the 'N', the sentence can enter R (903). Thereafter, when the quotation mark is started while maintaining R (903), the sentence can enter Q 904 and maintain Q (904) until the quote ends. Then, when the quotation marks are terminated, the sentence can enter R (903).

이 때, R(903)을 유지하다가 질문 키워드가 나타나면, 문장은 EX(904)로 진입한 후에 R(903)으로 복귀할 수 있다. 이 후, R(903)을 유지하다가 인용부호가 시작되면 문장은 Q(905)로 진입하며, 인용부호가 종료될 때까지 Q(905)를 유지한다. 그리고, 인용부호가 종료되면, 문장은 R(903)으로 진입할 수 있다. 그 이후에 마침표가 나타나면, 문장은 F(905)를 거쳐 종료할 수 있다.At this time, if the question keyword is displayed while maintaining R (903), the sentence can return to R (903) after entering EX (904). Thereafter, when the quotation mark is started while maintaining R 903, the sentence enters Q 905 and keeps Q 905 until the quotation is terminated. Then, when the quotation marks are terminated, the sentence can enter R (903). If a period appears after that, the sentence can be terminated via F (905).

이러한 과정을 거치면, 문장의 품사 정보가 추출되고, 인용문/화자 인식 시스템은 주격 조사 앞부분에 위치한 명사구를 화자 후보로 결정하고, 인용부호 사이의 인용구를 인용문으로 인식할 수 있다. 상기 문장에 따르면, '민주당 원혜경 원내대표'가 화자 후보로 결정될 수 있다. 또한, "북한의 대남 강경 태도가 ~ 일시 적 현상으로 보는가"와 "일시적인 것으로 ~ 신중하게 판단해야 한다"는 인용구가 인용문으로 인식될 수 있다. 이 때, 화자는 화자 후보의 고유한 이름인 원혜경으로 인식될 수 있다.In this way, the part of speech information of the sentence is extracted, and the quotation / speaker recognition system can determine the phrase as a speaker candidate located at the beginning of the frequency search and recognize the quotation between quotation marks as a quotation. According to the above sentence, 'Democratic Party Hye-kyung Won-kyu's representative' can be decided as speaker candidate. In addition, quotes such as "whether North Korea's hard-line attitude toward North Korea is a temporary phenomenon" and "should be judged with caution" should be recognized as a quotation. At this time, the speaker can be recognized as a unique name of the speaker candidate, Won Hye-kyung.

다만, 상기 문장의 경우 복수의 인용문과 화자가 인식되었기 때문에, 화자가 어떤 인용문을 발화했는 지 결정하는 화자 태깅 과정이 중요하다. 인간은 문맥 구조를 통해 인용문에 대해 화자를 용이하게 태깅할 수 있지만, 시스템은 이러한 화장 태깅 과정을 파악하기 힘들다. 본 발명의 일실시예에 따르면, 인용문/화자 인식 시스템은 많이 사용되는 문장 패턴을 미리 정의함으로써 문장 패턴에 따라 화자와 인용문의 관계를 파악할 수 있다. 문장 패턴은 도 8에서 구체적으로 설명될 수 있다. 그러나, 도 10과 같이 문장이 '질문'과 같은 질문 키워드를 포함하는 경우, 도 8의 문장 패턴을 그대로 적용할 수 없다. 따라서, 이러한 질문 키워드를 고려하여 인식한 인용문과 화자를 서로 정확하게 매칭하는 것이 요구된다.However, in the case of the sentence, since a plurality of quotations and speakers are recognized, it is important that the speaker tagging process determines which citation the utterer has uttered. Although humans can easily tag a speaker on a citation through a contextual structure, the system is not able to grasp this makeup tagging process. According to an embodiment of the present invention, the citation / speaker recognition system can preliminarily define a frequently used sentence pattern, thereby grasping the relationship between the speaker and the citation according to the sentence pattern. The sentence pattern can be explained in detail in Fig. However, when the sentence includes a question keyword such as 'question' as shown in FIG. 10, the sentence pattern of FIG. 8 can not be applied as it is. Therefore, it is required to accurately match the recognized citation and the speaker with each other in consideration of the question keyword.

도 10의 문장을 도 8에 적용하면, N1-Q1-Q2이므로, 인용문 Q1과 인용문 Q2는 각각 화자 N1이 발화한 것으로 태깅될 수 있다. 그러나, 이러한 결론은 실제 문맥 상 오류에 해당된다. 다시 말해서, 인용문 Q1은 N1이 발화한 것이 아니라, N1과 구별되는 제3자가 질문한 것이므로 인용문 Q1은 화자 N1으로 태깅될 수 없다.If the sentence of FIG. 10 is applied to FIG. 8, since N1-Q1-Q2, the quotation Q1 and the quotation Q2 can be tagged as having been spoken by the speaker N1, respectively. However, this conclusion is actually a contextual error. In other words, the quotation Q1 can not be tagged as the speaker N1 because the quotation Q1 is not a result of N1 being uttered but a third party distinguishing it from N1.

따라서, 도 10의 문장이 N1-Q1-Q2이라고 하더라도, 인용문/화자 인식 시스템은 인용문 Q2에 대해 화자 N1이 태깅될 수 있다. 즉, "일시적인 것으로 ~ 판단해야 한다"는 인용문은 화자인 원혜경이 발화한 것으로 태깅되고, "북한의 대남 강경 태도 ~ 일시적 현상으로 보는가"는 화자가 없는 것으로 태깅될 수 있다.Therefore, even if the sentence in FIG. 10 is N1-Q1-Q2, the quotation / speaker recognition system can tag the speaker N1 for the quotation Q2. In other words, the quotation that "should be judged as temporary" should be tagged as the speaker Won Hye-kyung's utterance, and the phrase "North Korea's hard-line attitude toward the North as temporary phenomenon" can be tagged as having no speaker.

도 11은 본 발명의 일실시예에 따른 인용문/화자를 인식하는 방법을 조합하여 인용문과 화자의 조합을 결정하는 일례를 도시한 도면이다.11 is a diagram illustrating an example of determining a combination of a citation and a speaker by combining a citation / speaker recognition method according to an embodiment of the present invention.

도 11에 따르면, 문서로부터 인용문 및 화자를 인식하는 4가지의 인식 방법이 도시된다. 이 때, 인식 방법 1,2,3,4 는 앞서 설명한 제1, 2, 3, 4 인식 방법에 대응한다.According to Fig. 11, four recognition methods for recognizing a quotation and a speaker from a document are shown. At this time, the recognition methods 1, 2, 3, and 4 correspond to the first, second, third, and fourth recognition methods described above.

본 발명의 일실시예에 따르면, 인용문/화자 인식 시스템은 미리 설정된 인용문 및 화자를 인식하는 적어도 하나의 인식 방법을 조합하여 문서에 포함된 문장으로부터 인용문 및 화자를 인식할 수 있다. According to an embodiment of the present invention, the citation / speaker recognition system can recognize a citation and a speaker from a sentence included in the document by combining preset citation and at least one recognition method for recognizing the speaker.

즉, 인용문/화자 인식 시스템은 제1, 2, 3, 4 인식 방법을 혼용하여 문장으로부터 인용문 및 화자를 인식할 수 있다.That is, the citation / speaker recognition system can recognize the citation and the speaker from the sentence by mixing the first, second, third, and fourth recognition methods.

본 발명의 다른 일실시예에 따르면, 인용문/화자 인식 시스템은 미리 설정된 인용문 및 화자를 인식하는 적어도 하나의 인식 방법을 적용하여 문서에 포함된 문장으로부터 인용문 및 화자를 인식하고, 인식 방법의 오류 정도를 고려하여 오류 정도가 가장 낮은 인식 방법에 따른 인용문 및 화자를 선택할 수 있다. According to another embodiment of the present invention, the citation / speaker recognition system recognizes a citation and a speaker from a sentence included in the document by applying preset citation and at least one recognition method for recognizing the speaker, And the speaker and the speaker according to the recognition method having the lowest error degree can be selected.

즉, 인용문/화자 인식 시스템은 제1, 2, 3, 4 인식 방법을 각각 적용하여 문장으로부터 인용문 및 화자를 인식하고, 각 인식 방법에 따른 결과의 오류를 비교하여 오류가 가장 낮은 인식 방법을 선택(voting)하여 최종적인 인용문 및 화자를 선택할 수 있다. That is, the citation / speaker recognition system recognizes the citation and the speaker from the sentences by applying the first, second, third, and fourth recognition methods, and compares the errors of the results according to each recognition method to select the lowest recognition method voting to select the final citation and speaker.

도 11은 인용문 및 화자를 인식하는 일례에 불과하고, 본 발명의 일실시예에 따른 인용문/화자 인식 시스템은 도 11의 구성에 한정되지 않는다.11 is only an example of recognizing a quotation and a speaker, and the citation / speaker recognition system according to an embodiment of the present invention is not limited to the configuration of FIG.

도 12는 본 발명의 일실시예에 따른 인용문/화자 인식 방법의 전체 과정을 도시한 플로우차트이다.FIG. 12 is a flowchart illustrating an entire process of a citation / speaker recognition method according to an embodiment of the present invention.

단계(S1201)에서, 인용문/화자 인식 시스템은 문서를 분석하여 인용문 및 화자를 인식할 수 있다. 도 12를 참고하면, 인용문 및 화자를 인식하는 단계(S1201)는 단계(S1204), 단계(S1205), 단계(S1206) 및 단계(S1207)를 포함할 수 있다.In step S1201, the quotation / speaker recognition system may analyze the document to recognize the quotation and the speaker. Referring to FIG. 12, step S1201 of recognizing a quotation and a speaker may include steps S1204, S1205, S1206, and S1207.

단계(S1204)에서, 인용문/화자 인식 시스템은 문서로부터 문장을 추출하여 문장에서 인용문을 인식할 수 있다.In step S1204, the quotation / speaker recognition system extracts a sentence from the document and recognizes the quotation in the sentence.

단계(S1205)에서, 인용문/화자 인식 시스템은 문장을 분석하여 인용문의 화자 후보를 결정할 수 있다. 본 발명의 일실시예에 따르면, 문장 분석 접근 방법과 패턴 인식 처리 방법에 따른 제1 내지 제4 인식 방법을 통해 인용문과 화자 후보를 인식할 수 있다.In step S1205, the quotation / speaker recognition system may analyze the sentence to determine the speaker candidate of the quotation. According to an embodiment of the present invention, the citation and the speaker candidate can be recognized through the first through fourth recognition methods according to the sentence analysis approach and the pattern recognition processing method.

단계(S1206)에서, 인용문/화자 인식 시스템은 문장을 분석하여 화자 후보 중 인용문을 발화한 화자를 선택할 수 있다. 구체적으로, 화자 선택부(106)는 인용문의 문맥 분석을 통해 인용문과 화자의 관계를 결정할 수 있다. 특히, 문장에 복수의 인용문이 포함된 경우, 해당 인용문을 발화한 화자를 선택하는 것이 중요하다.In step S1206, the quotation / speaker recognition system analyzes the sentence and selects a speaker who has uttered the quotation from among the speaker candidates. Specifically, the speaker selection unit 106 can determine the relationship between the quotation and the speaker through the context analysis of the quotation. In particular, when a sentence includes a plurality of quotations, it is important to select a speaker who has uttered the quotation.

단계(S1207)에서, 인용문/화자 인식 시스템은 선택한 화자의 고유한 이름을 인식할 수 있다. 만약, 화자가 "성+직함(직책)"으로 구성된 경우, 인용문/화자 인식 시스템은 문서를 전체적으로 분석하여, 화자의 원래 명칭을 미리 저장한 결과를 통해 화자의 고유한 이름을 인식할 수 있다.In step S1207, the quotation / speaker recognition system may recognize a unique name of the selected speaker. If the speaker is composed of "last name + title (title) ", the quotation / speaker recognition system analyzes the document as a whole and recognizes the speaker's unique name by pre-storing the original name of the speaker.

단계(S1202)에서, 인용문/화자 인식 시스템은 인용문에 대해 인식한 화자를 태깅할 수 있다. 즉, 문장에 복수의 인용문과 복수의 화자가 포함된 경우, 인용문/화자 인식 시스템은 인용문 각각에 대해 화자를 태깅하여 하나의 셋트로 구성함으로써 인용문을 화자로 인덱싱할 수 있다. 인덱싱된 인용문은 추후 화자 검색을 통해 검출될 수 있다.In step S1202, the quotation / speaker recognition system may tag the recognized speaker for the quotation. That is, when a sentence contains a plurality of quotations and a plurality of speakers, the quotation / speaker recognition system can index the quotations as a speaker by tagging the speakers for each quotation to form one set. The indexed quotation can be detected later by a speaker search.

단계(S1203)에서, 인용문/화자 인식 시스템은 화자에 대한 검색 요청을 수신하면 화자가 발화한 인용문을 검색하여 검색 결과로 제공할 수 있다. 인용문은 서비스에 따라 다양한 형태로 사용자에게 제공될 수 있다.In step S1203, upon reception of the search request for the speaker, the quotation / speaker recognition system may search for a quotation sent by the speaker and provide the search result. Quotations can be provided to the user in various forms depending on the service.

문서를 분석하여 인용문 및 화자를 인식하는 단계(S1201)는 다음과 같은 제1 내지 제4 인식 방법에 따라 수행될 수 있다.The step of analyzing the document and recognizing the quotation and the speaker (S1201) may be performed according to the following first to fourth recognition methods.

일례로, 인용문/화자 인식 시스템은 패턴 처리 사전을 구축하여 텍스트 토큰을 추출하고, 텍스트 토큰에 기초하여 상기 문장의 인용문 및 화자를 인식할 수 있다(제1 인식 방법). 이 때, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 상기 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다(S1204). 그리고, 인용문/화자 인식 시스템은 문장에서 주격 조사의 앞부분에 위치한 명사구를 화자 후보로 결정할 수 있다(S1205).For example, the citation / speaker recognition system may construct a pattern processing dictionary to extract a text token and recognize a citation and a speaker of the sentence based on the text token (first recognition method). At this time, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark starts in the sentence to the point where the quotation mark ends, as a quotation (S1204). Then, the quotation / speaker recognition system can determine a candidate noun phrase located at the beginning of the nominative search as a speaker candidate in the sentence (S1205).

일례로, 인용문/화자 인식 시스템은 형태소 분석기를 이용하여 상기 문장으로부터 품사 정보를 추출하고, 품사 정보에 기초하여 문서의 인용문 및 화자를 인식할 수 있다(제2 인식 방법). 이 때, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다(S1204). 그리고, 인용문/화자 인식 시스템은 문장에서 주격 조사 앞 부분에 위치한 명사구를 화자 후보로 결정할 수 있다(S1205).For example, the quotation / speaker recognition system can extract the parts of speech information from the sentence using the morpheme analyzer, and recognize the citation and the speaker of the document based on the part of speech information (second recognition method). At this time, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark starts to the point where the quotation mark ends, as a quotation in the sentence (S1204). Then, the quotation / speaker recognition system can determine the candidate word positioned in the front part of the sentence in the sentence as the speaker candidate (S1205).

일례로, 인용문/화자 인식 시스템은 문장에 복수의 인용문이 포함된 경우, 형태소 분석기를 이용하여 문장으로부터 품사 정보를 추출하고, 품사 정보에 기초하여 문서의 인용문 및 화자를 인식할 수 있다(제3 인식 방법). 그리고, 인용문/화자 인식 시스템은 화자와 인용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅할 수 있다.For example, when a plurality of quotations are included in a sentence, the quotation / speaker recognition system can extract the part of speech information from the sentence using the morpheme analyzer, and recognize the quotation and the speaker of the document based on the part of speech information Recognition method). Then, the quotation / speaker recognition system can tag the speaker on the quotation based on a predetermined sentence pattern indicating the relationship between the speaker and the quotation.

이 때, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다(S1204). 그리고, 인용문/화자 인식 시스템은 문장에서 주격 조사 앞부분에 위치한 명사구를 화자 후보로 결정할 수 있다(S1205).At this time, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark starts to the point where the quotation mark ends, as a quotation in the sentence (S1204). Then, the quotation / speaker recognition system can determine a candidate noun phrase located at the beginning of the sentence examination as a speaker candidate (S1205).

그러면, 인용문/화자 인식 시스템은 도 8에 도시된 문장 패턴에 따라 화자와 인용문 과의 관계를 파악하여 화자 후보 중 인용문을 발화한 화자를 서로 매칭시킬 수 있다.Then, the quotation / speaker recognition system can recognize the relationship between the speaker and the quotation according to the sentence pattern shown in FIG. 8, and match the speakers that have uttered the quotation among the speaker candidates.

일례로, 인용문/화자 인식 시스템은 문장에 복수의 인용문이 포함되고 질문 키워드가 포함된 경우, 형태소 분석기를 이용하여 문장으로부터 품사 정보를 추출하고, 품사 정보에 기초하여 문서의 인용문 및 화자를 인식할 수 있다(제4 인식 방법). 특히, 제4 인식 방법은 문장에 복수의 인용문이 존재하는 경우에 대한 예외 상황(문장에 질문, 물음 등의 질문 키워드가 포함된 경우)에 적용될 수 있는 것이라고 할 수 있다.For example, when a plurality of quotations are included in a sentence and a question keyword is included in the sentence / speaker recognition system, speech information is extracted from the sentence using a morpheme analyzer, and a quotation and a speaker of the document are recognized (Fourth recognition method). Particularly, the fourth recognition method can be applied to an exceptional situation (when a query keyword such as a question or a question is included in a sentence) when a plurality of quotations exist in a sentence.

그리고, 인용문/화자 인식 시스템은 문장에 포함된 질문 키워드와 화자와 인 용문과의 관계를 나타내는 미리 설정된 문장 패턴에 기초하여 인용문에 대해 화자를 태깅할 수 있다. 다만, 문장에 질문 키워드가 포함된 경우, 도 8에 도시된 문장 패턴에 따른 화자와 인용문 과의 관계를 그대로 적용할 수 없다. 구체적으로, 인용문/화자 인식 시스템은 질문 키워드가 나타내는 인용문과 나머지 인용문 및 화자 후보와의 관계를 고려하여 도 8에 도시된 문장 패턴에 따라 인용문에 대해 화자를 태깅할 수 있다.Then, the quotation / speaker recognition system can tag the speaker on the quotation based on the question keyword included in the sentence, the preset sentence pattern indicating the relationship between the speaker and the quotation. However, when the sentence includes the question keyword, the relationship between the speaker and the quotation according to the sentence pattern shown in Fig. 8 can not be applied as it is. Specifically, the quotation / speaker recognition system can tag the speaker for the quotation according to the sentence pattern shown in FIG. 8 in consideration of the relationship between the quotation indicated by the question keyword and the remaining quotations and the speaker candidate.

이 때, 인용문/화자 인식 시스템은 문장에서 인용부호가 시작하는 지점에서 상기 인용부호가 종료하는 지점까지의 문구를 인용문으로 인식할 수 있다(S1204). 그리고, 인용문/화자 인식 시스템은 문장에서 주격 조사 앞부분에 위치한 명사구를 화자 후보로 결정할 수 있다(S1205).At this time, the quotation / speaker recognition system can recognize the phrase from the point where the quotation mark starts in the sentence to the point where the quotation mark ends, as a quotation (S1204). Then, the quotation / speaker recognition system can determine a candidate noun phrase located at the beginning of the sentence examination as a speaker candidate (S1205).

인용문/화자 인식 시스템은 앞에서 언급한 제1 내지 제4 인식 방법에 따라 화자 후보를 결정하면, 문장을 분석하여 화자 후보 중 인용문을 발화한 화자를 선택할 수 있다(S1206). If the speaker candidate is determined according to the first to fourth recognition methods described above, the quotation / speaker recognition system may analyze the sentence and select a speaker who has uttered the quotation among the speaker candidates (S1206).

특히, 문장에 복수의 인용문이 포함된 경우, 인용문/화자 인식 시스템은 제3 인식 방법을 통해 미리 설정된 문장 패턴에 따라 복수의 인용문과 화자 사이의 관계를 서로 매칭시켜 인용문을 발화한 화자를 선택할 수 있다. 또한, 문장에 복수의 인용문이 포함되고, 질문 키워드(질문 또는 물음)가 포함된 경우, 인용문/화자 인식 시스템은 제4 인식 방법을 통해 질문 키워드와 미리 설정된 문장 패턴에 기초하여 복수의 인용문과 화자 사이의 관계를 서로 매칭시켜 인용문을 발화한 화자를 선택할 수 있다.In particular, when the sentence includes a plurality of quotations, the quotation / speaker recognition system may select a speaker who has uttered the quotation by matching the relationships between the plurality of quotations and the speaker according to a predetermined sentence pattern through the third recognition method have. If the sentence contains a plurality of quotations and the question keyword (question or question) is included, the quotation / speaker recognition system searches the plurality of quotations and the speaker Can be selected by matching the relationships between the words.

도 12에서 설명되지 않은 부분은 도 1 내지 도 11의 설명을 참고할 수 있다.The parts not described in FIG. 12 can be referred to the description of FIG. 1 to FIG.

또한 본 발명의 일실시예에 따른 인용문/화자 인식 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체를 포함할 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함될 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.In addition, the citation / speaker recognition method according to an embodiment of the present invention may include a computer-readable medium including program instructions for performing various computer-implemented operations. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The media may be program instructions that are specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and perform program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions may include machine language code such as those generated by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Modification is possible. Accordingly, the spirit of the present invention should be understood only in accordance with the following claims, and all equivalents or equivalent variations thereof are included in the scope of the present invention.

도 3은 본 발명의 일실시예에 따라 문장에 제1 인식 방법을 적용한 일례를 도시한 도면이다.FIG. 3 is a diagram illustrating an example in which a first recognition method is applied to a sentence according to an embodiment of the present invention.

도 7은 본 발명의 일실시예에 따라 문장에 제3 인식 방법을 적용한 일례를 도시한 도면이다.7 is a diagram illustrating an example in which a third recognition method is applied to a sentence according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명>Description of the Related Art

100: 인용문/화자 인식 시스템100: Quotations / Speaker Recognition System

101: 인용문/화자 인식부101: Quote / Speaker Recognition

102: 화자 태깅부102: Speaker tagging part

103: 검색 결과 제공부103: Search result providing service

Claims

A citation / speaker recognition unit for analyzing the document and recognizing the citation and the speaker; And

A speaker tagging unit for tagging the recognized speaker with respect to the quotation;

Lt; / RTI >

Wherein the quotation /

A citation recognition unit for extracting a sentence from the document and recognizing a citation in the sentence;

A speaker candidate determining unit for analyzing the sentence to determine a speaker candidate of the quotation;

A speaker selection unit for analyzing the sentence and selecting a speaker from among the speaker candidates that uttered the quotation; And

A speaker recognition unit for recognizing a unique name of the selected speaker,

Quot; < / RTI >

The method according to claim 1,

Wherein the quotation /

Constructing a pattern processing dictionary to extract a text token, and recognizing the citation and the speaker of the sentence based on the text token.

The method according to claim 1,

Wherein the quotation /

Extracts the part of speech information from the sentence using the morpheme analyzer, and recognizes the quotation and the speaker of the document based on the part of speech information.

The method according to claim 1,

Wherein the quotation /

Extracting part of speech information from the sentence using a morpheme analyzer when the sentence includes a plurality of quotations and a speaker, recognizing a quotation and a speaker of the document based on the part of speech information,

The speaker tagging unit,

And a speaker is tagged with respect to the quotation based on a predetermined sentence pattern indicating a relationship between the speaker and the quotation.

The method according to claim 1,

Wherein the quotation /

Extracting part of speech information from the sentence using a morpheme analyzer when the sentence contains a plurality of quotations and a speaker, recognizing a quotation and a speaker of the document based on the part of speech information,

The speaker tagging unit,

And a speaker is tagged with respect to the quotation based on a question keyword included in the sentence and a predetermined sentence pattern indicating a relationship between the speaker and the quotation.

6. The method according to any one of claims 2 to 5,

The citation-

Recognizes a phrase from a point at which a quotation mark starts to a point at which the quotation mark ends, as a quotation,

Wherein the speaker candidate determining unit determines,

Wherein the phrase in the sentence is determined as a candidate speaker.

The method according to claim 1,

Wherein the quotation /

And a citation and a speaker are recognized from the sentence by combining preset citation and at least one recognition method for recognizing the speaker.

The method according to claim 1,

Wherein the quotation /

Recognizing a citation and a speaker from the sentence by applying a preset citation and at least one recognition method for recognizing the speaker,

And selecting a citation and a speaker according to the recognition method having the lowest error degree in consideration of the degree of error of the recognition method.

The method according to claim 1,

And a search result providing unit for searching a cited talk sentence of the speaker upon receiving a search request for the talker,

/ RTI > < / RTI >

Analyzing the document to recognize a quotation and a speaker; And

Wherein the citation / speaker recognition system tags the recognized speaker for the citation

Lt; / RTI >

Analyzing the document to recognize the citation and the speaker,

Extracting a sentence from the document by the citation / speaker recognition system and recognizing a citation in the sentence;

Analyzing the sentence to determine a speaker candidate of the quotation;

Analyzing the sentence by the quotation / speaker recognition system to select a speaker who has uttered the quotation from among the speaker candidates; And

Recognizing the unique name of the selected speaker by the quotation / speaker recognition system;

/ RTI > < / RTI >

11. The method of claim 10,

Analyzing the document to recognize the citation and the speaker,

Wherein the citation / speaker recognition system constructs a pattern processing dictionary to extract a text token, and recognizes the citation and the speaker of the sentence based on the text token.

11. The method of claim 10,

Analyzing the document to recognize the citation and the speaker,

The citation / speaker recognition system extracts parts of speech information from the sentence using a morphological analyzer, and recognizes the citation and the speaker of the document based on the part of speech information.

11. The method of claim 10,

Analyzing the document to recognize the citation and the speaker,

Wherein when the sentence includes a plurality of citation and a speaker, the citation / speaker recognition system extracts parts of speech information from the sentence using a morpheme analyzer, recognizes the citation and the speaker of the document based on the parts of speech information,

Wherein the tagging of the recognized speaker for the quotation comprises:

And the citation / speaker recognition system tags the citation on the citation based on a predetermined sentence pattern indicating the relationship between the speaker and the citation.

11. The method of claim 10,

Analyzing the document to recognize the citation and the speaker,

Wherein the tagging of the recognized speaker comprises:

Wherein the citation / speaker recognition system tags a speaker on the citation based on a query keyword included in the sentence and a preset sentence pattern indicating a relationship between the speaker and the citation.

15. The method according to any one of claims 11 to 14,

The step of recognizing the quotation in the sentence comprises:

Recognizing the phrase from the point at which the quotation mark begins at the quote mark / speaker recognition system to the point at which the quotation mark ends, as a quotation,

The step of determining a speaker candidate of the quotation includes:

Wherein the quotation / speaker recognition system determines, as a speaker candidate, a noun phrase located at the beginning of the nominal search in the sentence.

11. The method of claim 10,

Wherein the recognizing of the quotation and the speaker comprises:

Wherein the citation / speaker recognition system recognizes the citation and the speaker from the sentence by combining the preset citation and at least one recognition method for recognizing the speaker.

11. The method of claim 10,

Wherein the recognizing of the quotation and the speaker comprises:

Wherein the citation / speaker recognition system recognizes a citation and a speaker from the sentence by applying a preset citation and at least one recognition method recognizing the speaker, and considering the degree of error of the recognition method, the citation / And selecting a citation and a speaker according to a recognition method having the lowest degree of error.

11. The method of claim 10,

Searching the citation / speaker recognition system for a search request for the speaker, and providing the search result as a search result;

/ RTI > < / RTI >

A computer-readable recording medium on which a program for executing the method of any one of claims 10 to 14 or 16 to 18 is recorded.