KR101700819B1

KR101700819B1 - Apparatus and method for speech recognition

Info

Publication number: KR101700819B1
Application number: KR1020120118892A
Authority: KR
Inventors: 김승희; 김상훈
Original assignee: 한국전자통신연구원
Priority date: 2012-05-02
Filing date: 2012-10-25
Publication date: 2017-02-01
Anticipated expiration: 2032-10-25
Also published as: KR20130127901A

Abstract

본 발명은 PC 또는 모바일 기기에서 작동하는 음성인식 및 자동통역을 위한 장치에 관한 것으로 본 발명에 따른 음성 인식 장치는 음성 인식을 위한 미리 결정된 분류의 음성 인식 영역에 대한 단위로서 도메인을 선택하기 위한 화면을 사용자에게 표시하는 디스플레이부; 상기 사용자로부터 도메인의 선택을 입력 받는 사용자 입력부; 및 상기 도메인에 대한 상기 사용자의 선택 정보를 송신하는 통신부를 포함한다. 본 발명에 따르면 사용자에게 직관적이고 간편한 사용자 인터페이스를 통한 음성 인식 장치를 제공하여, 사용자로 하여금 쉽게 음성 인식 시스템의 지정 도메인을 선택/수정하고, 지정된 음성 인식 시스템을 통해 음성인식 및 자동통역의 정확도와 성능을 향상 시킬 수 있다.The present invention relates to a device for speech recognition and automatic interpretation operating on a PC or a mobile device, and a speech recognition device according to the present invention includes a screen for selecting a domain as a unit for a speech recognition area of a predetermined classification for speech recognition To the user; A user input unit for receiving a selection of a domain from the user; And a communication unit for transmitting the selection information of the user to the domain. According to the present invention, it is possible to provide a user with an intuitive and easy-to-use speech recognition device, which enables a user to easily select / modify a designated domain of a speech recognition system, Performance can be improved.

Description

[0001] Apparatus and method for speech recognition [0002]

본 발명은 음성인식 및 자동통역 기능이 탑재된 장치 또는 음성 인식 방법에 관한 것으로서, 보다 상세하게는 음성 인식을 위한 데이터베이스의 도메인을 선택하는 방법에 관한 것이다. Field of the Invention [0002] The present invention relates to a device or a speech recognition method equipped with a speech recognition and automatic interpretation function, and more particularly to a method of selecting a domain of a database for speech recognition.

종래의 음성인식 또는 자동통역 시스템은 다양한 방면의 많은 어휘나 표현을 모두 훈련시키기엔 비효율적이므로 보통 하나의 영역, 즉 도메인에만 훈련되어 있다. 종래의 음성인식 또는 자동통역 어플리케이션은 대부분의 경우 디폴트로 지정되어 있는 도메인을 수정할 수 없다. 또한, 사용자가 도메인을 직접 선택할 수 있는 경우라도, 사용자가 사용하기 불편하고 선택 내용도 매우 단순한 것에 그치는 문제점이 있었다. 따라서, 음성 인식 환경에 대한 적응도가 떨어지고, 음성인식 및 자동통역의 정확도가 낮아지는 문제점이 있었다. Conventional speech recognition or automatic interpretation systems are usually trained in only one domain, domain, since it is inefficient to train many vocabularies or expressions in various ways. Conventional speech recognition or automatic interpretation applications can not, in most cases, modify a domain that is designated by default. Further, even if the user can directly select the domain, there is a problem that the user is inconvenient to use and the selection content is very simple. Therefore, the adaptability to the speech recognition environment is lowered, and the accuracy of voice recognition and automatic interpretation is lowered.

본 발명은 사용자들에게 음성인식 또는 자동 통역시에 참조하는 데이터베이스, 즉 도메인을 쉽게 선택할 수 있는 사용자 인터페이스를 제공함으로써, 상황에 따라 도메인 선택을 쉽게 하도록 함으로써 음성인식 및 자동통역의 정확도를 높이 것을 목적으로 한다. The present invention provides users with a database to be referred to in speech recognition or automatic interpretation, that is, a user interface for easily selecting a domain, thereby making it easy to select a domain according to a situation, thereby improving the accuracy of speech recognition and automatic interpretation .

상기 기술적 과제를 해결하기 위한 본 발명의 일 실시예에 따른 음성 인식 장치는 음성 인식을 위한 미리 결정된 분류의 음성 인식 영역에 대한 단위로서 도메인을 선택하기 위한 화면을 사용자에게 표시하는 디스플레이부; 상기 사용자로부터 도메인의 선택을 입력 받는 사용자 입력부; 및 상기 도메인에 대한 상기 사용자의 선택 정보를 송신하는 통신부를 포함한다.According to an aspect of the present invention, there is provided a speech recognition apparatus comprising: a display unit for displaying a screen for selecting a domain as a unit for a speech recognition area of a predetermined classification for speech recognition; A user input unit for receiving a selection of a domain from the user; And a communication unit for transmitting the selection information of the user to the domain.

본 발명에 따르면 사용자에게 직관적이고 간편한 도메인 선택 방법을 제공할 수 있고, 이를 통해 음성인식 및 자동통역의 정확도와 성능을 향상 시킬 수 있다.According to the present invention, it is possible to provide an intuitive and simple domain selection method to a user, thereby improving the accuracy and performance of voice recognition and automatic interpretation.

도 1은 본 발명의 일 실시예에 따른 음성 인식 서비스를 제공하기 위한 네트워크 다이어그램이다.
도 2는 본 발명의 일 실시예에 따른 사용자 단말기의 구성을 나타낸다.
도 3은 본 발명의 일 실시예에 따른는 지정 도메인의 구조 및 관계를 예시적으로 나타낸다.
도 4 내지 11은 본 발명의 일 실시예에 따른 사용자 단말기의 디스플레이부에 표시된 화면을 예시적으로 나타낸다.
도 12는 본 발명의 일 실시예에 따른 음성 인식 도메인 지정 방법을 나타내는 순서도이다.1 is a network diagram for providing a speech recognition service according to an embodiment of the present invention.
2 shows a configuration of a user terminal according to an embodiment of the present invention.
3 illustrates an exemplary structure and relationship of a designated domain according to an embodiment of the present invention.
4 to 11 illustrate exemplary screens displayed on a display unit of a user terminal according to an exemplary embodiment of the present invention.
12 is a flowchart illustrating a method of specifying a speech recognition domain according to an embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 음성 인식 방법을 제공하기 위한 네트워크 구성도이다. 1 is a network configuration diagram for providing a speech recognition method according to an embodiment of the present invention.

사용자 단말기(10)는 사용자로부터 음성과 도메인 선택 정보를 입력 받아 음성 인식 서버(20)에게 전달한다. 사용자 단말기(10)는 PC, 노트북, 스마트폰 등 통신 기능이 장착되고, 사용자가 음성 또는 텍스트를 입력할 수 있는 임의의 컴퓨팅 장치일 수 있다. The user terminal 10 receives voice and domain selection information from a user and delivers the voice and domain selection information to the voice recognition server 20. The user terminal 10 may be any computing device equipped with a communication function such as a PC, a notebook, a smart phone, etc., and a user can input voice or text.

음성 인식 서버(20)는 수신된 음성 및 선택된 도메인에 대한 정보를 통해 DB(30)에 저장된 음성 인식용 참조 데이터 들 중 사용자가 선택한 도메인에 해당하는 데이터를 참조하여 음성 인식을 수행한다. 그리고 나서, 수행된 음성 인식 결과를 사용자 단말기(10)로 전송한다.The speech recognition server 20 performs speech recognition by referring to data corresponding to a domain selected by the user among reference data for speech recognition stored in the DB 30 through the received speech and information on the selected domain. Then, the voice recognition result is transmitted to the user terminal 10.

DB(30)에는 음성 인식 서버(20)가 음성 인식 동작을 위해 필요한 각종 데이터들이 저장되며, 음성 인식 동작 중에 참조할 데이터, 예컨대 코퍼스(corpus), 언어 사전 등의 데이터들이 도메인 별로 저장된다.The DB 30 stores various data necessary for speech recognition by the speech recognition server 20. Data such as a corpus, a language dictionary, and the like to be referred to during speech recognition are stored for each domain.

이하 도 2를 참조하여 사용자 단말기(10)에 대하여 보다 상세히 설명한다.The user terminal 10 will now be described in more detail with reference to FIG.

도 2에 도시된 바와 같이, 본 실시예에 따른 사용자 단말기(10)는 디스플레이부(100)와 사용자 입력부(200), 통신부(300)를 포함할 수 있다.2, the user terminal 10 according to the present embodiment may include a display unit 100, a user input unit 200, and a communication unit 300.

디스플레이부(100)는 음성 인식을 위해 필요한 정보를 표시하며, 사용자에게 음성 인식을 위해 참조할 도메인을 지정하기 위한 메뉴들을 표시할 수 있다. 본 실시예에서 음성 인식 서버(20)는 음성 신호를 입력 받아 의미를 인식하는 시스템으로서, 사용자가 지정한 지정 도메인 또는 일반 도메인을 기반으로 음성 인식을 수행한다. The display unit 100 displays information necessary for speech recognition, and may display menus for specifying a domain to be referred to by the user for voice recognition. In the present embodiment, the speech recognition server 20 receives a speech signal and recognizes the meaning of the speech signal, and performs speech recognition based on a designated domain or a general domain specified by the user.

일반 도메인은 특정 도메인이 아닌 일반적으로 사용하는 언어에 대한 음성 인식을 지원하기 위해 참조되는 데이터베이스이고, 지정 도메인은 상술한 일반 영역 보다 정확한 음성 인식을 지원하기 위하여 특정한 상황에 대해 자동으로, 또는 사용자에 의해 선택된 데이터베이스이다. 예를 들어, 입력된 음성이 여행에 관련된 것이면 '여행' 도메인을 지정 도메인으로 하여 음성 인식이 수행될 수 있고, 일반 도메인이 선택된 경우보다 더 좋은 음성 인식 결과를 생성할 수 있다. A general domain is a database referred to in order to support speech recognition for a commonly used language other than a specific domain. A designated domain is automatically or automatically assigned to a specific situation to support speech recognition more accurately than the above- Is the database selected by the user. For example, if the input voice is related to travel, voice recognition can be performed using the 'travel' domain as the designated domain, and a voice recognition result can be generated that is better than when the general domain is selected.

음성 인식 도메인의 개념은 도 3을 참조하여 보다 상세히 설명한다. 또한 본 실시예에서 음성 인식 도메인은 음성 인식의 영역을 분류하는 단위, 즉 음성 인식 과정 중에 참조하는 데이터베이스라 할 수 있다.The concept of a speech recognition domain is described in more detail with reference to FIG. In this embodiment, the speech recognition domain is a unit for classifying the speech recognition area, that is, a database to be referred to during the speech recognition process.

도 3을 참조하면, 상술한 바와 같이 음성 인식 서버(20)는 디폴트로 또는 사용자의 선택에 의해 일반 도메인(31)에 대해서 동작할 수 있다. 여기에 각각의 지정 도메인으로써 제1 서브 도메인(32)을 가지며, 제1 서브 도메인(32)을 부모 도메인으로 하는 제2 서브 도메인(33)들을 가질 수 있다. 나아가 도시되지는 않았으나, 제2 서브 도메인을 부모 도메인으로 하는 제3, 4의 서브 도메인을 포함할 수 있다. Referring to FIG. 3, as described above, the speech recognition server 20 can operate on the general domain 31 by default or by a user's selection. Here, each of the designated domains may have a first sub-domain 32 and a second sub-domain 33 having a first sub-domain 32 as a parent domain. Further, although not shown, it may include a third, fourth sub-domain having a second sub-domain as a parent domain.

이러한 제2 서브 도메인들은 부모 도메인의 일부 특성(단어 또는 표현)을 대체할 수도 있고, 부모 도메인에 없는 특성을 가지고 있을 수도 있다. 또한, 각각의 도메인들은 상호 중복될 수도 있다. 예를 들어, 2개의 서브 도메인, 예컨대 여행(Touring) 도메인은 다른 도메인인 비지니스(Business) 도메인과 일부 중복될 수 있고, 여행 도메인의 서브 도메인인 음식점(Restaurant) 도메인의 일부도 비즈니스 도메인과 중복될 수 있다. These second sub-domains may replace some characteristics (words or expressions) of the parent domain, or may have characteristics that are not in the parent domain. Also, each of the domains may overlap with each other. For example, two subdomains, e.g., a Touring domain may partially overlap with a Business domain that is another domain, and a portion of a restaurant domain that is a subdomain of a travel domain may also overlap with a business domain .

이하 본 실시예에 따른 디스플레이부(100)에서 표시하는 도메인을 선택하기 위한 화면의 구성에 대하여 도면을 참조하여 설명한다.Hereinafter, a configuration of a screen for selecting a domain displayed by the display unit 100 according to the present embodiment will be described with reference to the drawings.

본 실시예에서 도메인 디스플레이부(100)는 상기 사용자가 선택 가능하거나 또는 선택해제 가능한 도메인을 표시한다. 도 4를 참조하면, 디스플레이부(100)에는 도메인을 계층 구조 또는 트리구조로 표시할 수 있다. 도 4에 도시된 바와 같이, 각각의 도메인은 도메인의 지정 도메인의 명칭인 라벨(Label)에 의해 표시될 수 있다.In this embodiment, the domain display unit 100 displays the domain that the user can select or deselect. Referring to FIG. 4, the display unit 100 may display domains in a hierarchical or tree structure. As shown in FIG. 4, each domain can be represented by a label which is a name of a domain designated by the domain.

일반 도메인의 경우 'General'이라는 명칭의 라벨으로 표현되며, 일반 도메인의 서브 도메인으로 4개의 서브 도메인으로 여행 도메인(47), 사업 도메인, 회의 도메인(45), 의학 도메인(46)을 포함할 수 있다. (47), a business domain, a conference domain (45), and a medical domain (46) in four subdomains of a general domain. have.

여행관련 도메인은 'Touring'이라는 명칭의 라벨을 갖는 도메인(47)로 표현되며 다시 3개의 서브 도메인으로 음식점(Restaurant) 도메인, 공항(Airport) 도메인, 자동차 렌트(Car Rent) 도메인을 포함할 수 있다. 나아가 음식점 도메인은 'Restaurant'라는 명칭의 라벨을 갖는 도메인으로 표현되며 음식점의 종류에 따라 추가적인 서브 도메인을 포함할 수 있다. 예를 들어 도 4의 경우 한국 음식점 도메인은 'Korean Food', 중국 음식점 도메인은 'Chinese Food'라는 라벨로 표현된다.The travel related domain may be represented by a domain 47 having a label labeled 'Touring' and may include a restaurant domain, an airport domain, and a car rent domain in three subdomains . Further, the restaurant domain may be represented by a domain having a label named 'Restaurant' and may include additional subdomains depending on the type of the restaurant. For example, in FIG. 4, the Korean restaurant domain is labeled 'Korean Food' and the Chinese restaurant domain is labeled 'Chinese Food'.

한국 음식점 도메인은 음성 인식을 지원하기 위하여 한국 음식 및 음식점 명칭에 대한 언어 데이터를 포함할 수 있다.The Korean restaurant domain may include language data for Korean food and restaurant names to support speech recognition.

또한 회의 도메인은 'Conference'라는 명칭의 라벨(45)로 표현될 수 있고, 서브 도메인으로 컴퓨터 공학 도메인 및 기계 공학 도메인을 포함할 수 있다. 컴퓨터 공학 도메인의 경우 'Computer Science'라는 라벨로 표현될 수 있으며, 기계 공학 도메인은 'Mechanical Engineering'라는 라벨로 표현될 수 있다. 회의 도메인의 경우 다른 영역에 비해 전문적인 어휘의 사용빈도가 높으므로, 회의의 관련 분야에 따라 음성 인식 영역을 세분화 하여 지정된 음성 인식 서비스를 제공하는 경우 인식의 정확도 및 나아가 통역의 정확성을 높일 수 있다.Also, the conference domain may be represented by a label 45 named 'Conference' and may include computer engineering domains and mechanical engineering domains as subdomains. Computer engineering domains can be represented by the label 'Computer Science', and the mechanical engineering domain can be represented by the label 'Mechanical Engineering'. The conference domain has a higher frequency of use of specialized vocabulary than other areas, so that it is possible to improve the accuracy of recognition and the accuracy of interpreting when the designated speech recognition service is provided by segmenting the speech recognition area according to the related field of the conference .

이하 본 실시예에서 도메인 디스플레이부(100)를 통하여 사용자로부터 도메인을 선택하기 위한 도메인의 선택을 입력 받는 사용자 입력부(200)에 대하여 설명한다.Hereinafter, a user input unit 200 receiving a selection of a domain for selecting a domain from a user through the domain display unit 100 will be described.

계속하여 도 4를 참조하면, 본 실시예에서 사용자 단말기(10)의 디스플레이부(100)는 도메인 영역들을 트리구조를 만들어 사용자에게 보여주고, 사용자는 선택하고자 하는 도메인(45)을 지정 도메인 표시영역(42)으로 마우스 또는 터치 제스쳐로 드래그 앤 드롭(43)하여 도메인을 선택할 수 있다. 또한, 지정 도메인 표시영역(42)에서 이미 선택된 도메인(44)을 영역 밖으로 드래그앤드롭 하여 지정을 해제할 수 있다. 4, in the present embodiment, the display unit 100 of the user terminal 10 creates a tree structure of domain regions and displays the domain regions to the user, and the user selects a domain 45 to be selected as a designated domain display region The user can select the domain by dragging and dropping 43 to the mouse or touch gesture. In addition, the designation can be canceled by dragging and dropping the domain 44 already selected in the designated domain display area 42 out of the area.

이때, 도메인 트리는 '+' 버튼(46) 또는 '-' 버튼(47)에 의해 서브 도메인들을 표시하거나 감출 수 있다. At this time, the domain tree can display or hide the subdomains by the '+' button 46 or the '-' button 47.

또한 트리의 노드 중에서 이미 선택되어 있는 도메인들은 다르게 표현하여 사용자로 하여금 불필요한 재선택을 피할 수 있도록 할 수 있다. 또한, 선택된 지정 도메인 표시 부분(42)의 일반(General) 영역은 미리 선택되어 있으므로, 다른 도메인과는 다르게 표시하여 사용자로 하여금 이 사실을 인지할 수 있도록 한다. Also, the already selected domains among the nodes of the tree may be represented differently so that the user can avoid unnecessary reselection. In addition, since the general area of the selected designated domain display part 42 is selected in advance, it is displayed differently from other domains so that the user can recognize this fact.

도 4를 참조하면, 현재 디스플레이부의 지정 가능 영역 표시영역에서 이미 선택된 지정 도메인에 대한 도메인으로서 'General', 'Touring', 'Restaurant', 'Korean Food'는 선택되지 않은 도메인과는 다르게 표시(48)되며, 사용자에게 선택되었음을 알려준다. 또한 선택된 도메인은 지정 도메인 표시영역(42)에 나타나며, 이중 'General' 도메인(48)은 기본적으로 선택된 도메인으로서 선택의 해제가 불가한바 선택된 다른 도메인과 달리 표현(49)하여 사용자에게 이러한 사실을 알려준다.4, 'General', 'Touring', 'Restaurant', and 'Korean Food' are displayed as a domain for a designated domain already selected in the assignable area display area of the display unit ) And informs the user that it has been selected. In addition, the selected domain is displayed in the designated domain display area 42. The 'General' domain 48 is basically a selected domain, and can not be deselected. .

또한 도 4는 현재 사용자가 회의 관련 도메인을 지정하기 위하여 'Conference' 도메인(45)을 지정 도메인 표시영역(42)으로 드래그(43)하는 것을 나타낸다. 나아가, 본 실시예에서 지정 도메인의 선택을 위해 도메인의 라벨을 드래그 앤 드롭하는 것 외에 마우스 더블 클릭 또는 우클릭을 통한 메뉴 호출로 지정 도메인을 선택하는 것 또한 가능하다. 4 also shows that the current user drags (43) the 'Conference' domain 45 to the designated domain display area 42 to specify the meeting related domain. Furthermore, in this embodiment, in addition to dragging and dropping the label of the domain for the selection of the designated domain, it is also possible to select the designated domain by a mouse-click or right-click menu call.

나아가 도 5를 참조하면 드래그 앤 드롭 방식 대신에, 사용자의 클릭 또는 터치를 통해, 빈 체크 박스(51)에 체크 하는 방식(52)으로 사용자가 해당 도메인을 선택했음을 보여줄 수 있다. 또한, 항상 선택되어 있어야 하는 일반(General) 도메인의 경우(53)에는 다른 색깔로 체크를 표시함으로써, 사용자에게 항상 선택되어 있음을 알릴 수 있다.Furthermore, referring to FIG. 5, instead of the drag-and-drop method, the user can click on the touch or touch to check that the empty check box 51 is selected. Also, in the case of the general domain (53) which should always be selected, it is possible to notify the user that it is always selected by displaying a check with a different color.

또한 도 6은 본 발명의 도메인의 선택을 위한 사용자 인터페이스를 동적으로 표현한 예를 나타낸다. 도 6은 도 4내지 도 5의 트리 구조를 좀 더 동적으로 표현함으로써, 사용자로 하여금 좀 더 이해하기 쉽게 조작 할 수 있도록 한다. 사용자가 선택을 원하는 도메인을 클릭 또는 터치하여 해당 도메인의 서브 도메인으로서 자식 노드들이 보여지고, 선택이 가능(61)하다. 또한, 선택되지 않은 지정 도메인의 도메인들은 다르게 표현(62)되어 사용자에게 진다.6 shows an example of dynamically representing a user interface for selecting a domain of the present invention. Figure 6 shows the tree structure of Figures 4-5 more dynamically so that the user can manipulate it more easily. The user clicks or touches the desired domain to display the child nodes as subdomains of the corresponding domain, and selection 61 is possible. Also, the domains of the unselected designated domains are represented (62) differently to the user.

나아가 도 7은 계층적 구조를 지니지 않은 대등한 구조를 지닌 지정 도메인들의 지정을 위한 사용자 인터페이스의 한 예를 나타낸다. 각각의 지정 도메인은 아이콘으로 표시되어 나타난다. 아이콘의 하단에는 상술한 도메인의 라벨 명칭이 표시되어 해당 아이콘이 어떠한 도메인에 대응되는 것인지를 사용자에게 알려준다. 또한 아이콘의 형태의 경우 도 7에는 모두 동일한 형태로 표시되어 있으나, 직감적으로 대응되는 도메인을 사용자에게 알려줄 수 있도록 'Medical' 아이콘을 '+'와 같은 직감적인 형태로 구성하는 것도 가능하다.7 shows an example of a user interface for specifying designated domains having an equivalent structure without a hierarchical structure. Each designated domain is indicated by an icon. At the bottom of the icon, the label name of the domain is displayed, and the user is informed to which domain the corresponding icon corresponds. In addition, in the case of the icon form, all of the icons are displayed in the same form in FIG. 7, but it is also possible to configure the 'Medical' icon as an intuitive form such as '+' so as to inform the user of the intuitively corresponding domain.

화면에는 지정 가능 도메인 표시영역(74)이 있어 사용자에게 선택 가능한 도메인 영역이 무엇인지를 알려주며, 화면 하단에는 선택된 지정 도메인 표시영역(71)이 있어 사용자에게 현재 선택되어 있으며 또한 선택 해제 가능한 지정 도메인이 무엇인지 알려준다. 사용자는 선택하고자 하는 지정 도메인에 해당하는 도메인(72)를 지정 가능 영역 표시영역에서 클릭 또는 터치하거나, 선택된 지정 도메인 표시영역(71)으로 드래그 앤 드롭하여 지정 도메인을 선택한다. 선택된 지정 도메인은 기존의 지정 가능 영역 표시영역에서 아이콘이 없어지게 하여 사용자의 불필요한 재선택을 피할 수 있게 한다. 이와 마찬가지로, 이미 선택된 지정 도메인에 대해서는 클릭 또는 터치를 하거나, 지정 가능 도메인 표시영역으로 드래그 앤 드롭하여 선택을 해제한다. 해제된 지정 도메인은 지정 가능 도메인 표시영역(74)에 나타나, 다시 선택이 가능하도록 한다. 또한, 많은 지정 도메인을 사용자로 하여금 손쉽게 접근 하게 하기 위해서 스크롤바(75)를 배치하여 접근성을 높인다. 본 실시예에서는 스크롤 동작에 의해 지정 가능 도메인 표시영역에 표시되는 아이콘들은 변하게 되나, 지정 도메인 표시영역의 경우는 아이콘을 통한 선택을 위하여 스크롤과 무관하게 변하지 않는 것이 바람직하다.The screen includes a designatable domain display area 74 to inform the user of the selectable domain area. The selected domain display area 71 is displayed at the bottom of the screen, Tell me what it is. The user selects a designated domain by clicking or touching the domain 72 corresponding to the designated domain to be selected in the assignable region display region or by dragging and dropping the selected domain into the selected domain display region 71. The selected designated domain makes the icon disappear in the existing assignable area display area, thereby avoiding unnecessary reselection of the user. Likewise, the already selected selected domain is clicked or touched, or dragged and dropped to the designated domain display area to deselect it. The released designated domain is displayed in the assignable domain display area 74 so that it can be selected again. In addition, the scroll bar 75 is arranged to increase accessibility so that a user can easily access many designated domains. In the present embodiment, the icons displayed in the assignable domain display area are changed by the scroll operation, but in the case of the designated domain display area, it is preferable that they are not changed regardless of the scroll for selection through the icon.

도 8은 도 7를 응용한 계층적 구조를 지닌 지정 도메인들을 위한 사용자 인터페이스의 한 예이다. 사용자가 하위 지정 도메인을 보길 원하는 지정 도메인에 대한 도메인(81)로서 아이콘을 클릭 또는 터치하여 선택하면, 해당 도메인의 하위 도메인에 대한 아이콘들이 하위영역 표시 부분에 나타난다. 하위영역 표시영역(82)은 상위의 지정 도메인 도메인들과 구별되어 보이도록 경계를 만들어 보여준다. 또한 이곳의 아이콘(84)들도 지정 도메인 표시영역(83)으로 드래그 앤 드롭 하거나, 클릭 또는 터치 동작을 이용해 지정 도메인으로 선택 할 수 있다. 그리고, 지정 도메인의 선택 해제에 있어 상위 지정 도메인을 선택/해제 하게 되면 자동으로 상위 지정 도메인에 포함되는 하위 지정 도메인도 선택/해제가 되어 포괄적인 선택/해제를 지원할 수 있다.FIG. 8 is an example of a user interface for designated domains having a hierarchical structure using FIG. When the user selects or touches the icon as the domain 81 for the designated domain to which the user wants to view the sub-designated domain, the icons for the sub-domain of the corresponding domain appear in the sub-domain display portion. The lower region display area 82 shows boundaries so as to be distinguished from upper designated domain domains. The icons 84 may also be dragged and dropped into the designated domain display area 83, or selected by using a click or a touch operation. If the upper designated domain is selected / released in the deselection of the designated domain, the lower designated domain automatically included in the higher designated domain is also selected / released to support comprehensive selection / cancellation.

도 8에서 사용자가 'Seoul' 도메인에 대응하는 아이콘(81)을 선택하면, 'Seoul' 도메인의 서브 도메인들 예컨대, 'Seoul Hotel', 'Seoul Restaurant'등이 표시된다. 사용자는 표시된 아이콘 중 'Seoul Hotel'(84)을 터치하거나 지정 도메인 표시영역(83)으로 드래그 앤 드롭(84)하여 서울 호텔 관련 도메인을 지정 도메인으로 선택할 수 있다.In FIG. 8, when the user selects the icon 81 corresponding to the 'Seoul' domain, the subdomains of the 'Seoul' domain, for example, 'Seoul Hotel', 'Seoul Restaurant' and the like are displayed. The user can select the Seoul hotel related domain as the designated domain by touching the 'Seoul Hotel' 84 among the displayed icons or by dragging and dropping it to the designated domain display area 83.

나아가 본 실시예에 따른 디스플레이부(100)는 사용자의 단말기를 통해 수집되는 사용자 정보를 이용하여 사용자 상황을 파악하고, 파약된 상황 정보에 따라 추천되는 도메인을 사용자에게 표시하는 것도 가능하다. 도 9는 사용자의 상황을 파악하여 알맞은 지정 도메인을 제시하는 예를 나타낸다.Further, the display unit 100 according to the present embodiment can identify the user's situation by using the user information collected through the user's terminal, and display the recommended domain to the user according to the poured status information. FIG. 9 shows an example of identifying a user's situation and presenting an appropriate designated domain.

사용자의 단말기를 통해 수집되는 사용자 정보는 사용자 단말기에 내장된 GPS(Global Positioning System)를 통한 사용자의 위치정보, 카메라를 통한 주변 정보, 마이크를 통해 인식되는 주변 소리 정보 등으로서, 이를 이용해 사용자의 상황을 파악한다. 따라서 본 실시예에서의 음성 인식 장치는 사용자 상황에 대한 정보를 통해 지정 가능한 도메인을 사용자에게 추천해준다. 예를 들어 사용자의 GPS를 통해 한국의 서울을 지정 도메인으로 추천해 줄 수 있으며, 카메라를 통해 사용자의 주변이 식당가로 인식된다면 여행 및 레스토랑을 지정 도메인으로 추천해 주는 것도 가능하다. 또한 마이크를 통해 주변 소리로서 비행기 이착륙 소리가 인식된다면 공항을 추천해 줄 수 있다.The user information collected through the user's terminal is information about the user's location through a GPS (Global Positioning System) built in the user terminal, peripheral information through the camera, surrounding sound information recognized through a microphone, . Therefore, the speech recognition apparatus according to the present embodiment recommends a domain that can be designated through the information on the user situation to the user. For example, it is possible to recommend Korea's Seoul as a designated domain through the user's GPS. If the user's peripherals are recognized as a restaurant through the camera, it is also possible to recommend a travel and restaurant as a designated domain. You can also recommend an airport if you are aware of the sound of airplanes taking off and landing as a sound through the microphone.

따라서 본 실시예에서 디스플레이부(100)는 추천되는 지정 도메인들의 도메인들을 강조하여 화면에 표시하는 것이 바람직하다. 지정 도메인 중에서 사용자가 쉽게 필요한 지정 도메인만을 선택할 수 있게 도와주어 불필요한는 어휘나, 문장 표현들의 인식을 지원하는 인식 지원 데이터의 추가를 방지함으로써, 더 빠르고 정확한 음성인식 및 자동통역 결과를 얻을 수 있다. 또한, 반대로 상황 정보를 이용하여 불필요하거나 사용가능성이 낮을 것으로 파악되는 지정 도메인들은 흐리게 표시하거나 표시하지 않음으로써, 사용자의 간편한 인식을 돕고, 불필요한 선택을 방지하는 것도 가능하다.Therefore, in the present embodiment, it is preferable that the display unit 100 emphasizes the domains of the recommended domains to be displayed on the screen. The user can easily select only the required designated domain from among the designated domains, thereby preventing unnecessary addition of recognition support data supporting recognition of vocabulary and sentence expressions, thereby enabling faster and more accurate voice recognition and automatic interpretation result. On the contrary, it is also possible to facilitate recognition of the user and to prevent unnecessary selection by not displaying or displaying the designated domains that are unnecessary or unlikely to be used by using the situation information.

도 10을 참조하면 현재 GPS정보를 통해 사용자의 위치가 한국의 여수로 인식되는 경우(101) 여수와 관련된 도메인에 대응되는 아이콘(102) (Yeosu Hotel, Yeosu Restaurant, Yeosu Expo)을 강조하여 표시하고 이와 관련도가 낮은 도메인에 대응되는 아이콘(103)(Medical)은 흐리게 표시한다. Referring to FIG. 10, in a case where a user's location is recognized as a Yeosu in Korea through current GPS information, an icon 102 (Yeosu Hotel, Yeosu Restaurant, Yeosu Expo) corresponding to a domain related to Yeosu is highlighted and displayed And the icon 103 (Medical) corresponding to the domain having a low degree of association is blurred.

나아가, 본 실시에에 따른 도메인 디스플레이부(100)는 도메인의 선택에 따라 지정되는 음성 인식 수준을 사용자에게 예시하기 위한 적어도 하나의 예시 인식 데이터를 사용자에게 표시하는 것도 가능하다. Further, the domain display unit 100 according to the present embodiment may display at least one example recognition data to the user to illustrate to the user the speech recognition level designated according to the selection of the domain.

도 11을 참조하면 사용자가 'Conference'관련 도메인을 추가하려고 생각하는 경우 인식 예시 부분(114)을 통해 'Conference'도메인을 지정 도메인으로 선택 하는 경우 인식 가능한 음성의 수준이 “Where is the nearest Gal-bi buffet from the 8 th Advanced Computing (115) Conference hall?”과 같은 수준의 문장 인식이 가능하다는 것을 간접적으로 예시하여 주어 사용자의 지정 도메인 선택에 도움을 줄 수 있다. Referring to FIG. 11, when a user intends to add a 'Conference' related domain, when the 'Conference' domain is selected as a designated domain through the recognition example part 114, the level of recognizable voice is' Where is the nearest Gal- bi buffet from the 8 th Advanced Computing (115) Conference hall? "Can be indirectly illustrated to help users select a specific domain.

이상 본 실시예에서 사용자 입력부(200)는 디스플레이부(100)에서 표시되는 도메인 선택 화면을 통해 사용자로부터 선택된 도메인을 입력 받고, 통신부(300)는 선택된 도메인에 대한 정보를 음성인식서버(20)로 전송한다. The user input unit 200 receives the domain selected from the user through the domain selection screen displayed on the display unit 100 and the communication unit 300 transmits information on the selected domain to the voice recognition server 20 send.

음성 인식 서버(20)는 수신된 선택된 도메인에 대한 정보를 통해 DB(30)에 저장된 음성 인식용 참조 데이터 들 중 사용자가 선택한 도메인에 해당하는 데이터를 참조하여 음성 인식을 수행한다. 그리고 나서, 수행된 음성 인식 결과를 사용자 단말기(10)로 전송한다.The speech recognition server 20 performs speech recognition by referring to data corresponding to a domain selected by the user, among the reference data for speech recognition stored in the DB 30 through information on the received selected domain. Then, the voice recognition result is transmitted to the user terminal 10.

상술한 본 실시예에 따른 음성 인식 서버(20)는 음성인식을 위해서 사용자 단말기(10)와 통신을 하여 음성인식 서버로부터 결과를 받아오는 시스템으로 설명 되어 있으나, 사용자 단말기(10)의 시스템 성능에 따라 음성 인식 서버(20)는 단말기 내부의 음성인식모듈로, DB(30)는 내부의 메모리를 통해 구현할 수 도 있으며, 이러한 경우 사용자 단말기(10)의 통신부(300)는 선택된 도메인에 대한 정보를 외부의 음성인식서버(20)가 아닌 내부의 음성인식모듈로 전송하는 것일 수 있다.Although the speech recognition server 20 according to the present embodiment has been described as a system for communicating with the user terminal 10 for speech recognition and receiving results from the speech recognition server, In this case, the communication unit 300 of the user terminal 10 transmits the information on the selected domain to the voice recognition server 20 through the voice recognition module in the terminal, and the DB 30 can be implemented through the internal memory. To the internal speech recognition module rather than to the external speech recognition server 20.

즉 이 경우 사용자 단말기(10)의 구성, 디스플레이부(100)와 사용자 입력부(200), 통신부(300)는 음성인식 도메인의 선택을 위한 인터페이스 모듈로 동작하며, 음성 인식 서버는 이와 연동하여 음성 인식을 수행하는 음성인식 모듈로 구현된다.That is, in this case, the configuration of the user terminal 10, the display unit 100, the user input unit 200, and the communication unit 300 operate as an interface module for selecting a speech recognition domain, As shown in FIG.

상술한 본 발명의 일 실시예에 따른 음성 인식 장치(10)를 통해 사용자들에게 쉽게 이해할 수 있고, 간편하게 지정 도메인을 수정할 수 있는 사용자 인터페이스를 제공함으로써, 변화되는 환경에 대한 적응도를 높여 음성인식 및 자동통역의 정확도를 높일 수 있다. 이하 본 실시예에 따른 음성 인식 장치(10)를 통한 영역 지정 방법에 대하여 설명한다.The user can easily understand and easily modify the designated domain through the speech recognition device 10 according to the embodiment of the present invention, thereby improving the adaptability to the changed environment, And the accuracy of the automatic interpretation can be increased. Hereinafter, an area designation method using the speech recognition apparatus 10 according to the present embodiment will be described.

도 12를 참조하면, 음성 인식 영역 지정 방법은 도메인 선택화면 표시 단계(S100), 도메인 선택 입력 단계(S200) 및 선택 정보 송신 단계(S300)를 포함한다.Referring to FIG. 12, the speech recognition area designation method includes a domain selection screen display step S100, a domain selection input step S200, and a selection information transmission step S300.

도메인 선택화면 표시 단계(S100)는 상술한 디스플레이부(100)가 음성 인식 서버(20)의 지정된 음성 인식을 위한 미리 결정된 분류의 음성 인식 영역에 대한 단위로서 도메인을 선택하기 위한 화면을 사용자에게 표시한다.The domain selection screen display step S100 displays the screen for selecting the domain as the unit for the speech recognition area of the predetermined classification for the specified speech recognition of the speech recognition server 20 by the display unit 100 do.

도메인 선택 입력 단계(S200)는 상술한 사용자 입력부(200)가 사용자로부터 도메인의 선택을 입력 받는다. In the domain selection input step (S200), the user input unit 200 receives a selection of a domain from a user.

선택 정보 송신 단계(S300)는 상술한 통신부(300)가 상기 도메인에 대한 상기 사용자의 선택 정보를 상기 음성 인식 서버(20)에 송신한다.In the selection information transmission step S300, the communication unit 300 transmits the user's selection information for the domain to the voice recognition server 20. [

이상의 영역 지정 방법의 각 단계의 세부 동작은 상술한 디스플레이부(100), 사용자 입력부(200) 및 통신부(300)에서 설명한 것과 동일한 것으로, 이에 대한 설명은 중복되므로 생략한다.The detailed operations of each step of the area designation method are the same as those described in the display unit 100, the user input unit 200, and the communication unit 300, and the description thereof will be omitted.

이상의 예는 음성 인식 동작을 위주로 설명하였으나, 자동 통역에도 음성 인식 동작이 필수적이기 때문에 자동 통역에도 동일하게 적용될 수 있다. 예컨대, 도 1의 음성 인식 서버(10)는 자동 통역 서버일 수 있다. Although the speech recognition operation is described above as an example, the speech recognition operation is also necessary for the automatic interpretation, so that the same can be applied to the automatic interpretation. For example, the speech recognition server 10 of FIG. 1 may be an automatic interpretation server.

이상의 본 발명의 여러 가지 측면, 실시예, 구현 또는 특징들이 개별적으로 또는 임의의 조합으로 사용될 수 있으며, 여기에 설명되는 다양한 실시예는 예를 들어, 소프트웨어, 하드웨어 또는 이들의 조합된 것으로 구현될 수 있다. 하드웨어적인 구현에 의하면, 여기에 설명되는 실시예는 ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays, 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기능 수행을 위한 전기적인 유닛 중 적어도 하나를 이용하여 구현될 수 있다. 또한 여기서 소프트웨어는 컴퓨터 판독가능 매체 상의 컴퓨터 판독가능 코드로서 구현될 수 있다. 컴퓨터 판독가능 매체는 나중에 컴퓨터 시스템에 의해 판독될 수 있는 데이터를 저장할 수 있는 임의의 데이터 저장 장치이다. 컴퓨터 판독 가능 매체의 예는 판독 전용 메모리, 랜덤 액세스 메모리, CD-ROM, DVD, 자기 테이프, 광학 데이터 저장 장치를 포함한다. 컴퓨터 판독가능 매체는 또한, 컴퓨터 판독가능 코드가 분산된 방식으로 저장 및 실행되도록, 네트워크-연결 컴퓨터 시스템들에 걸쳐 분산되어 있을 수 있다.The various aspects, embodiments, implementations, or features of the invention described above may be used individually or in any combination, and the various embodiments described herein may be implemented as software, hardware, or a combination thereof, for example have. According to a hardware implementation, the embodiments described herein may be implemented as application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays May be implemented using at least one of a processor, controllers, micro-controllers, microprocessors, and an electrical unit for performing a function. Readable medium may be any data storage device capable of storing data that can be later read by a computer system. Examples of computer readable media include read-only memory, random access memory Memory, CD-ROM, DVD, magnetic tape, optical data storage. Readable medium can also, be stored and executed as computer-readable code is distributed manner, network-may be distributed across computer systems connected.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to the preferred embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

A display unit including a screen displaying a plurality of domains selectable by the user for speech recognition;
A user input unit for receiving a selection of at least one of the plurality of domains from the user; And
And a communication unit for transmitting the selection information of the user to the domain,
Wherein the plurality of domains include at least one other domain including a general domain including a database usable for speech recognition and another database usable for speech recognition.

The method according to claim 1,
Wherein the display unit displays a domain selectable by the user and a domain selectable by the user.

The method according to claim 1,
Wherein the display unit displays the domain indicating the domain by classifying the domain into a hierarchy according to the speech recognition level.

The method of claim 3,
Wherein the display unit displays a domain for the domain selected from the user among the displayed domains classified into hierarchies.

The method of claim 3,
Wherein the hierarchical layer according to the speech recognition level is classified according to a generation status of a voice, and the generation status is reclassified according to a generation place.

The method of claim 3,
Wherein the display unit displays a domain indicating a domain corresponding to a lower layer of the selected domain according to the selection of the domain of the user.

The method according to claim 1,
Wherein the display unit recognizes a user's situation using user information collected through the user's terminal, and displays a recommended domain to the user according to the poured situation information.

The method according to claim 1,
Wherein the display unit displays at least one example recognition data to the user to illustrate to the user a speech recognition level designated according to the selection of the domain.

A domain selection screen display step of displaying a screen for selecting at least one of a plurality of domains as a unit for a speech recognition area of a predetermined classification for speech recognition;
A domain selection input step of receiving a selection of at least one of the plurality of domains from the user; And
And a selection information transmitting step of transmitting the selection information of the user to at least one of the plurality of domains,
Wherein the plurality of domains include at least one other domain including a general domain including a database usable for speech recognition and another database usable for speech recognition.

10. The method of claim 9,
Wherein the domain selection screen display step displays the domain selectable by the user or the domain selectable by the user.

10. The method of claim 9,
Wherein the domain selection screen display step displays the domain indicating the domain by classifying the domain into hierarchies according to the speech recognition level.

12. The method of claim 11,
Wherein the domain selection screen display step displays a domain of the domain selected from the user among the displayed domains classified into hierarchies.

12. The method of claim 11,
Wherein the hierarchy according to the speech recognition level is a hierarchical structure for classifying a general area providing a basic speech recognition area according to a generation situation of a voice and reclassifying the generation situation according to a generation place.

12. The method of claim 11,
Wherein the domain selection screen display step displays a domain indicating a domain corresponding to a lower layer of the selected domain according to the selection of the domain of the user.

12. The method of claim 11,
Wherein the domain selection screen display step identifies a user's situation using user information collected through the user's terminal and displays a recommended domain to the user according to the spoken context information.

12. The method of claim 11,
Wherein the domain selection screen display step displays at least one example recognition data to the user to illustrate to the user a speech recognition level designated according to the selection of the domain.

delete

An interface module for displaying a screen for selecting at least one domain among a plurality of domains for speech recognition to the user, receiving a selection of the domain from the user, and transmitting the selection information of the user to the domain; And
And a speech recognition module for performing speech recognition by referring to data corresponding to a domain selected by the user among the reference data for speech recognition through the received selection information of the user,
Wherein the plurality of domains include at least one other domain including a general domain including a database usable for speech recognition and another database usable for speech recognition.