KR102726966B1

KR102726966B1 - Manufacturing facility control method through voice recognition

Info

Publication number: KR102726966B1
Application number: KR1020230138933A
Authority: KR
Inventors: 안동욱; 남상도; 손진호; 원광재; 박문영
Original assignee: (주)미소정보기술
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2024-11-06
Anticipated expiration: 2043-10-17

Abstract

본 발명은 제조 설비를 제어하기 위한 제어명령을 분산하여 처리함으로써, 병목 현상 및 오류를 개선하도록 한 음성 인식을 통한 제조 설비 제어 방법에 관한 것이다. The present invention relates to a method for controlling manufacturing facilities through voice recognition, which improves bottlenecks and errors by distributing and processing control commands for controlling manufacturing facilities.

Description

{Manufacturing facility control method through voice recognition}

본 발명은 음성인식을 통한 제조 설비 제어 방법에 관한 것으로, 더 자세하게는, 제조 설비를 제어하기 위한 제어명령을 분산하여 처리함으로써, 병목 현상 및 오류를 개선하도록 한 음성 인식을 통한 제조 설비 제어 방법에 관한 것이다. The present invention relates to a method for controlling manufacturing facilities through voice recognition, and more specifically, to a method for controlling manufacturing facilities through voice recognition that improves bottlenecks and errors by distributing and processing control commands for controlling manufacturing facilities.

음성인식 기술이 발전함에 따라 각종 전자장치, 예컨대 TV, 스피커, 휴대폰 등과 같은 장치에 음성인식 기술이 적용되고 있다. 음성인식 제어의 통상적인 방식은, 사용자의 발화(發話)(utterance)를 데이터로 변환하여 인식하고, 인식된 발화에 대해 그 내용을 분석하여 실행시킨다. 이때 인식된 발화의 의미는 사전에 구축되어 있는 데이터베이스에 대한 조회를 통해 확인된다.As voice recognition technology advances, voice recognition technology is being applied to various electronic devices, such as TVs, speakers, and mobile phones. The conventional method of voice recognition control is to convert the user's utterance into data, recognize it, analyze the content of the recognized utterance, and execute it. At this time, the meaning of the recognized utterance is confirmed through a query to a database built in advance.

예컨대, 사용자가 "내일날씨 알려줘"라고 발화한 경우, 사용자의 명령어는 "알려줘"이고 이 명령어를 수행할 대상은 "내일날씨"이다. 데이터베이스에는 <날씨>라는 카테고리가 존재하고 이 카테고리에는 '내일날씨', '오늘날씨', '뉴욕날씨', '워싱턴날씨', '일기예보', '기상상황' 등과 같은 여러 단어가 포함되어 있다. 또한 데이터베이스에는 <날씨> 카테고리에 대해서 사용자가 원할 것으로 예상되는 명령어로서, '알려줘', '들려줘', '어때?' 등과 같은 각종 단어가 날씨에 대해 알려달라는 의미의 명령어로서 사전에 등록되어 있다. 따라서, 사용자가 "내일날씨 알려줘"라고 발화하면 음성인식 시스템은 내일날씨에 대한 정보를 제공하라는 의미로 인식하고, 해당 동작을 수행한다.For example, if a user utters "Tell me the weather for tomorrow", the user's command is "Tell me" and the target of this command is "tomorrow's weather". The database has a category called <Weather>, and this category includes various words such as "tomorrow's weather", "today's weather", "New York weather", "Washington weather", "weather forecast", and "weather conditions". In addition, the database has various words such as "Tell me", "Tell me", and "How is it?" pre-registered as commands that the user might want for the <Weather> category, meaning to tell the weather. Therefore, when the user utters "Tell me the weather for tomorrow", the voice recognition system recognizes this as meaning to provide information about tomorrow's weather, and performs the corresponding action.

이러한 종래의 음성인식 시스템이 공장 등의 제조 설비 제어에 적용되는 사례가 늘고 있다. 이러한, 종래의 기술로, 대한민국 공개특허공보 제10-2021-0086020호(2021년07월08일)에는 "딥러닝에 기반한 음성 인식을 이용하여 공장 설비를 제어하는 방법 및 시스템"이 게시된 바 있는데, 이는 딥러닝에 기반한 음성 인식을 이용하여 공장 설비를 제어하는 방법에 있어서, 음원 신호를 수신하면, 심층 신경망 학습 알고리즘을 이용하여 상기 음원 신호를 음성 신호 및 잡음 신호로 분리하는 과정; 상기 음성 신호를 문자 신호로 변환하는 과정; 상기 문자 신호를 전송하는 과정; 및 상기 문자 신호를 수신한 공장 설비 제어부에서는, 상기 문자 신호를 인텐트와 엔티티로 분석하여, 상기 엔티티에 상기 인텐트에 대응하는 동작이 수행되도록 제어하는 과정을 포함하되, 상기 심층 신경망은 잡음 신호가 포함된 음원 신호에서 음성 신호만 추출하도록 학습된 신경망인 것을 특징으로 하는 딥러닝에 기반한 음성 인식을 이용하여 공장 설비를 제어하는 방법에 관한 것이다. There are increasing cases where such conventional voice recognition systems are applied to control manufacturing facilities such as factories. As such conventional technology, Korean Patent Publication No. 10-2021-0086020 (July 8, 2021) published "Method and system for controlling factory facilities using deep learning-based voice recognition", which comprises the steps of: receiving a sound signal, separating the sound signal into a voice signal and a noise signal using a deep neural network learning algorithm; converting the voice signal into a text signal; transmitting the text signal; And the factory equipment control unit that receives the text signal analyzes the text signal into an intent and an entity, and controls the entity to perform an action corresponding to the intent, wherein the deep neural network is a neural network learned to extract only a voice signal from a sound source signal including a noise signal. The present invention relates to a method for controlling factory equipment using speech recognition based on deep learning.

그러나, 상기 종래의 "딥러닝에 기반한 음성 인식을 이용하여 공장 설비를 제어하는 방법 및 시스템"은 복수의 사용자로부터 제어 명령이 들어오는 대규모 제조 설비 제어 분야에서는, 다수의 입력 음성을 처리하기 어려웠으며, 이에 따라, 대규모 제조 설비 제어 분야에 적합하도록, 복수의 사용자로부터 제어 명령 처리를 원활히 할 수 있는 프로세스에 대한 필요성이 커지고 있는 추세다. However, the above conventional "method and system for controlling factory equipment using voice recognition based on deep learning" has had difficulty processing a large number of input voices in the field of large-scale manufacturing equipment control where control commands come from multiple users. Accordingly, there is a growing need for a process that can smoothly process control commands from multiple users to be suitable for the field of large-scale manufacturing equipment control.

대한민국 공개특허공보 제10-2021-0086020호(2021.07.08.공개), "딥러닝에 기반한 음성 인식을 이용하여 공장 설비를 제어하는 방법 및 시스템"Korean Patent Publication No. 10-2021-0086020 (published on July 8, 2021), "Method and system for controlling factory equipment using deep learning-based voice recognition"

상기와 같은 문제점을 해결하기 위하여 본 발명은 제조 설비를 제어하기 위한 제어명령을 분산하여 처리함으로써, 병목 현상 및 오류를 개선하도록 한 음성 인식을 통한 제조 설비 제어 방법을 제공하는데 목적이 있다. In order to solve the above problems, the present invention aims to provide a method for controlling manufacturing facilities through voice recognition, which improves bottlenecks and errors by distributing and processing control commands for controlling manufacturing facilities.

상기와 같은 목적을 달성하기 위하여 본 발명은 화자인식 및 STT 수행단계(S10); 의도추론 수행 단계(S20); 및, 연동된 설비로 명령을 전달하는 명령전달 단계(S30);를 포함하는 음성 인식을 통한 제조 설비 제어 방법에 있어서, 상기 명령전달 단계(S30)는, 명령의도 및 제어권한 확인단계(S31); 제어명령 분산 처리 단계(S32); 및 제어명령 송출 단계(S33);를 포함하는 것을 특징으로 하는 음성 인식을 통한 제조 설비 제어 방법을 제공한다. In order to achieve the above-mentioned purpose, the present invention provides a manufacturing facility control method through voice recognition, which comprises a speaker recognition and STT performance step (S10); an intention inference performance step (S20); and a command transmission step (S30) for transmitting a command to a linked facility, wherein the command transmission step (S30) comprises a command intention and control authority confirmation step (S31); a control command distributed processing step (S32); and a control command transmission step (S33).

이때, 상기 명령의도 및 제어권한 확인단계(S31)에서의 명령의도에는, 대상 설비, 설비를 수행시키고자 하는 행동, 수행 명령을 적용할 대상 또는 대상 옮기거나 적용 시킬 목표 중 어느 하나 이상을 포함하는 것을 특징으로 한다. At this time, the command intent in the command intent and control authority confirmation step (S31) is characterized by including at least one of the target facility, an action to be performed by the facility, a target to which the command to be performed is to be applied, or a target to which the command to be performed is to be moved or applied.

이때, 상기 제어명령 분산 처리 단계(S32)는, 제어명령 병렬처리 단계(S321) 또는 제어명령 순차처리 단계(S322) 중 어느 하나 이상을 포함하는 것을 특징으로 한다. At this time, the control command distributed processing step (S32) is characterized by including at least one of a control command parallel processing step (S321) or a control command sequential processing step (S322).

이때, 상기 제어명령 병렬처리 단계(S321)에서는 기설정된 수 미만의 제어명령은 병렬로 처리하여 제조설비의 동작 지연 발생을 방지하는 것을 특징으로 한다. At this time, the control command parallel processing step (S321) is characterized by processing control commands less than a preset number in parallel to prevent delay in the operation of manufacturing equipment.

이때, 상기 제어명령 순차처리 단계(S322)에서는, 하나의 설비에 중복 또는 상충되는 명령이 동시에 전달되는 경우의 오작동을 방지하기 위하여, 기설정된 수를 초과하여 신규하게 전달되는 명령은 대기상태로 적용하여 제조설비의 오작동을 방지하는 것을 특징으로 한다. At this time, in the control command sequential processing step (S322), in order to prevent malfunction in the case where duplicate or conflicting commands are simultaneously transmitted to one facility, commands newly transmitted in excess of a preset number are applied in a standby state to prevent malfunction of the manufacturing facility.

이때, 상기 제어명령 순차처리 단계(S322)를 수행함에 있어, 제어명령 순차처리 단계(S322)에서 대기 중인 제어명령에 대해, '멈춤' 명령을 통해 일괄 취소 가능한 것을 특징으로 한다. At this time, when performing the control command sequential processing step (S322), it is characterized in that the control commands waiting in the control command sequential processing step (S322) can be canceled in batches through a ‘stop’ command.

본 발명에 따른 음성 인식을 통한 제조 설비 제어 방법은 제조 설비를 제어하기 위한 제어명령을 분산하여 처리함으로써, 설비 제어 명령의 병목 현상 및 오류를 개선하도록 한다. The method for controlling manufacturing equipment using voice recognition according to the present invention improves bottlenecks and errors in equipment control commands by distributing and processing control commands for controlling manufacturing equipment.

도 1은 본 개시의 실시예에 따른 음성 인식을 통한 제조 설비 제어 방법을 수행하기 위한 시스템의 예시도.
도 2는 본 발명의 실시예에 따른 음성 인식을 통한 제조 설비 제어 방법의 플로우차트.
도 3은 본 발명의 실시예에 따른 음성 인식을 통한 제조 설비 제어 방법의 상세한 플로우차트.
도 4는 본 발명의 실시예에 따른 음성 인식을 통한 제조 설비 제어 방법에 있어, 화자인식을 수행하는 과정을 나타낸 일 예시도.
도 5는 본 발명의 실시예에 따른 음성 인식을 통한 제조 설비 제어 방법에 있어, 의도추론을 수행하는 과정을 나타낸 일 예시도.
도 6은 본 발명의 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 일반적인 개략도. FIG. 1 is an exemplary diagram of a system for performing a method for controlling manufacturing equipment through voice recognition according to an embodiment of the present disclosure.
Figure 2 is a flow chart of a method for controlling manufacturing equipment through voice recognition according to an embodiment of the present invention.
FIG. 3 is a detailed flowchart of a method for controlling manufacturing equipment through voice recognition according to an embodiment of the present invention.
FIG. 4 is an exemplary diagram showing a process of performing speaker recognition in a method for controlling manufacturing equipment through voice recognition according to an embodiment of the present invention.
FIG. 5 is an exemplary diagram showing a process of performing intention inference in a method for controlling manufacturing equipment through voice recognition according to an embodiment of the present invention.
FIG. 6 is a general schematic diagram of an exemplary computing environment in which embodiments of the present invention may be implemented.

다양한 실시예들 및/또는 양상들이 이제 도면들을 참조하여 개시된다. 하기 설명에서는 설명을 목적으로, 하나 이상의 양상들의 전반적 이해를 돕기 위해 다수의 구체적인 세부사항들이 개시된다. 그러나, 이러한 양상(들)은 이러한 구체적인 세부사항들 없이도 실행될 수 있다는 점 또한 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 감지될 수 있을 것이다. 이후의 기재 및 첨부된 도면들은 하나 이상의 양상들의 특정한 예시적인 양상들을 상세하게 기술한다. 하지만, 이러한 양상들은 예시적인 것이고 다양한 양상들의 원리들에서의 다양한 방법들 중 일부가 이용될 수 있으며, 기술되는 설명들은 그러한 양상들 및 그들의 균등물들을 모두 포함하고자 하는 의도이다. 구체적으로, 본 명세서에서 사용되는 "실시예", "예", "양상", "예시" 등은 기술되는 임의의 양상 또는 설계가 다른 양상 또는 설계들보다 양호하다거나, 이점이 있는 것으로 해석되지 않을 수도 있다.Various embodiments and/or aspects are now disclosed with reference to the drawings. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more of the aspects. It will be apparent to one skilled in the art, however, that such aspect(s) may be practiced without these specific details. The following description and the annexed drawings set forth specific exemplary aspects of one or more of the aspects in detail. It should be understood, however, that these aspects are exemplary and that any of the various methods of the principles of the various aspects may be utilized, and the disclosed teachings are intended to encompass all such aspects and their equivalents. In particular, the terms “embodiment,” “example,” “aspect,” “example,” and “example” as used herein are not to be construed as being preferred or advantageous over other aspects or designs.

이하, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략한다. 또한, 본 명세서에 개시된 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않는다.Hereinafter, regardless of the drawing symbols, identical or similar components are given the same reference numerals and redundant descriptions thereof are omitted. In addition, when describing the embodiments disclosed in this specification, if it is determined that a detailed description of a related known technology may obscure the gist of the embodiments disclosed in this specification, the detailed description thereof is omitted. In addition, the attached drawings are only intended to facilitate easy understanding of the embodiments disclosed in this specification, and the technical ideas disclosed in this specification are not limited by the attached drawings.

비록 제 1, 제 2 등이 다양한 소자나 구성요소들을 서술하기 위해서 사용되나, 이들 소자나 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 소자나 구성요소를 다른 소자나 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제 1 소자나 구성요소는 본 발명의 기술적 사상 내에서 제 2 소자나 구성요소 일 수도 있음은 물론이다.Although the terms first, second, etc. are used to describe various elements or components, it is to be understood that these elements or components are not limited by these terms. These terms are merely used to distinguish one element or component from another element or component. Accordingly, it is to be understood that a first element or component referred to below may also be a second element or component within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with a meaning that can be commonly understood by a person of ordinary skill in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries shall not be ideally or excessively interpreted unless explicitly specifically defined.

더불어, 용어 "또는"은 배타적 "또는"이 아니라 내포적 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되지 않거나 문맥상 명확하지 않은 경우에, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 의도된다. 즉, X가 A를 이용하거나; X가 B를 이용하거나; 또는 X가 A 및 B 모두를 이용하는 경우, "X는 A 또는 B를 이용한다"가 이들 경우들 어느 것으로도 적용될 수 있다. 또한, 본 명세서에 사용된 "및/또는"이라는 용어는 열거된 관련 아이템들 중 하나 이상의 아이템의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다. Additionally, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or." That is, unless otherwise specified or clear from the context, "X employs A or B" is intended to mean either of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, "X employs A or B" can apply to any of these cases. Furthermore, the term "and/or" as used herein should be understood to refer to and include all possible combinations of one or more of the associated items listed.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징 및/또는 구성요소가 존재함을 의미하지만, 하나 이상의 다른 특징, 구성요소 및/또는 이들의 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. 또한, 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 본 명세서와 청구범위에서 단수는 일반적으로 "하나 또는 그 이상"을 의미하는 것으로 해석되어야 한다.Also, it should be understood that the terms "comprises" and/or "comprising" mean the presence of the features and/or components, but do not preclude the presence or addition of one or more other features, components, and/or groups thereof. Also, unless otherwise specified or clear from the context to refer to the singular form, the singular form as used in the specification and claims should generally be construed to mean "one or more."

더불어, 본 명세서에서 사용되는 용어 "정보" 및 "데이터"는 종종 서로 상호교환 가능하도록 사용될 수 있다.Additionally, the terms “information” and “data” as used herein may often be used interchangeably.

어떤 구성 요소가 다른 구성 요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성 요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결되어" 있다거나 "직접 접속되어"있다고 언급된 때에는, 중간에 다른 구성 요소가 존재하지 않는 것으로 이해되어야 할 것이다.When it is said that a component is "connected" or "connected" to another component, it should be understood that it may be directly connected or connected to that other component, but that there may be other components in between. On the other hand, when it is said that a component is "directly connected" or "directly connected" to another component, it should be understood that there are no other components in between.

이하의 설명에서 사용되는 구성 요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다.The suffixes "module" and "part" used for components in the following description are given or used interchangeably only for the convenience of writing the specification, and do not in themselves have distinct meanings or roles.

본 개시의 목적 및 효과, 그리고 그것들을 달성하기 위한 기술적 구성들은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 본 개시를 설명하는데 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 개시의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 개시에서의 기능을 고려하여 정의된 용어들로써 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다.The purpose and effect of the present disclosure, and the technical configurations for achieving them, will become clear with reference to the embodiments described in detail below together with the attached drawings. In explaining the present disclosure, if it is judged that a specific description of a known function or configuration may unnecessarily obscure the gist of the present disclosure, the detailed description thereof will be omitted. In addition, the terms described below are terms defined in consideration of the functions in the present disclosure, and these may vary depending on the intention or custom of the user or operator.

그러나 본 개시는 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있다. 단지 본 실시예들은 본 개시가 완전하도록 하고, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시는 청구항의 범주에 의해 정의될 뿐이다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.However, the present disclosure is not limited to the embodiments disclosed below, and may be implemented in various different forms. These embodiments are provided only to make the present disclosure complete and to fully inform those skilled in the art of the scope of the disclosure, and the present disclosure is defined only by the scope of the claims. Therefore, the definition should be made based on the contents throughout this specification.

본 개시에서 컴퓨팅 장치는 사용자 단말로부터 수신된 요청에 따라 화자 검증을 수행하고, 의도 추론을 수행하여, 설비 제어 명령을 전달할 수 있다. 구체적으로, 컴퓨팅 장치는 화자인식을 진행하는 모델, 의도추론을 수행하는 모델, 제어명령을 전달하는 모델 등을 호출할 수 있다. 그리고, 컴퓨팅 장치는 화자 검증 및 의도 추론 수행하여 설비 제어명령을 제조 설비로 전달할 수 있다. In the present disclosure, the computing device can perform speaker verification and intent inference according to a request received from a user terminal, and transmit a facility control command. Specifically, the computing device can call a model that performs speaker recognition, a model that performs intent inference, a model that transmits a control command, and the like. In addition, the computing device can perform speaker verification and intent inference and transmit a facility control command to a manufacturing facility.

이하, 첨부된 도면을 참조하여 본 발명에 따른 설비 제어 프로세스에 대하여 자세히 설명한다. Hereinafter, the equipment control process according to the present invention will be described in detail with reference to the attached drawings.

도 1은 본 개시의 몇몇 실시예에 따른 음성 인식을 통한 제조 설비 제어 방법을 수행하기 위한 예시적인 시스템을 도시한다.FIG. 1 illustrates an exemplary system for performing a method for controlling manufacturing equipment through voice recognition according to some embodiments of the present disclosure.

도 1을 참조하면 설비 제어 프로세스는 컴퓨팅 장치(100), 사용자 단말(200), 서버(300), 제조설비(400) 및 네트워크(N)를 포함할 수 있다. 다만, 상술한 구성 요소들은 화자 검증을 수행하고, 의도 추론을 수행하여, 설비 제어 명령을 전달하기 위한 시스템을 구현하는데 있어서 필수적인 것은 아니어서, 화자 검증을 수행하고, 의도 추론을 수행하여, 설비 제어 명령을 전달하기 위한 시스템은 위에서 열거된 구성요소들 보다 많거나, 또는 적은 구성요소들을 가질 수 있다.Referring to FIG. 1, the equipment control process may include a computing device (100), a user terminal (200), a server (300), a manufacturing facility (400), and a network (N). However, the above-described components are not essential for implementing a system for performing speaker verification, performing intent inference, and transmitting an equipment control command, and thus, a system for performing speaker verification, performing intent inference, and transmitting an equipment control command may have more or fewer components than the components listed above.

컴퓨팅 장치(100)는 예를 들어, 마이크로프로세서, 메인프레임 컴퓨터, 디지털 프로세서, 휴대용 디바이스 또는 디바이스 제어기 등과 같은 임의의 타입의 컴퓨터 시스템 또는 컴퓨터 디바이스를 포함할 수 있다.The computing device (100) may include any type of computer system or computer device, such as, for example, a microprocessor, a mainframe computer, a digital processor, a handheld device, or a device controller.

프로세서(110)는 통상적으로 컴퓨팅 장치(100)의 전반적인 동작을 처리할 수 있다. 프로세서(110)는 컴퓨팅 장치(100)에 포함된 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 저장부(120)에 저장된 응용 프로그램을 구동함으로써, 사용자에게 적절한 정보 또는 기능을 제공 또는 처리할 수 있다.The processor (110) can typically process the overall operation of the computing device (100). The processor (110) can process signals, data, information, etc. input or output through components included in the computing device (100) or run application programs stored in the storage unit (120), thereby providing or processing appropriate information or functions to the user.

본 개시에서, 프로세서(110)는 사용자 단말(200)로부터 화자 검증을 수행하고, 의도 추론을 수행하여, 설비 제어 명령을 전달하기 위한 요청 신호를 수신함에 따라, 서버(300)에 화자 검증 수행 정보, 의도 추론 수행 정보를 요청할 수 있다. 여기서, 화자 검증 수행 정보, 의도 추론 수행 정보는 운영 체제나　프로그래밍 언어가 제공하는 기능을 제어할 수 있도록 만든　인터페이스일 수 있다. 일례로, 화자 검증 수행 정보, 의도 추론 수행 정보는 사전 학습된 모델을 통해 사용자가 입력한 음성의 화자 검증과 의도를 추론할 수 있다. 화자 검증 수행 정보, 의도 추론 수행 정보는 프로세서(110)가 입력한 음성의 화자 검증 및 의도 추론을 수행하기 위한 정보일 수 있다. 실시예에 따라, 화자 검증 수행 정보, 의도 추론 수행 정보는 IP(Internet Protocol) 주소, 호스트 주소 및 포트 정보 중 적어도 하나를 포함할 수 있다. 프로세서(110)는 화자 검증 수행 정보, 의도 추론 수행 정보에 기초하여 결정된 학습 모델을 호출하여 요청 신호에 포함된 입력한 음성의 화자 검증 및 의도 추론을 수행할 수 있다. In the present disclosure, the processor (110) may request speaker verification performance information and intent inference performance information to the server (300) upon receiving a request signal for performing speaker verification and performing intent inference from the user terminal (200) to transmit a facility control command. Here, the speaker verification performance information and the intent inference performance information may be an interface that is created to control a function provided by an operating system or a programming language. For example, the speaker verification performance information and the intent inference performance information may infer speaker verification and intent of a voice input by a user through a pre-learned model. The speaker verification performance information and the intent inference performance information may be information for performing speaker verification and intent inference of a voice input by the processor (110). According to an embodiment, the speaker verification performance information and the intent inference performance information may include at least one of an IP (Internet Protocol) address, a host address, and port information. The processor (110) can perform speaker verification and intent inference of an input voice included in a request signal by calling a learning model determined based on speaker verification performance information and intent inference performance information.

저장부(120)는 메모리 및/또는 영구저장매체를 포함할 수 있다. 메모리는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The storage (120) may include memory and/or a permanent storage medium. The memory may include at least one type of storage medium among a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (e.g., an SD or XD memory, etc.), a random access memory (RAM), a static random access memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, and an optical disk.

통신부(130)는 컴퓨팅 장치(100)와 통신 시스템 사이, 컴퓨팅 장치(100)와 사용자 단말(200) 사이, 컴퓨팅 장치(100)와 서버(300) 사이 또는 컴퓨팅 장치(100)와 네트워크(N) 사이의 통신을 가능하게 하는 하나 이상의 모듈을 포함할 수 있다. 이러한 통신부(130)는 이동통신 모듈, 유선 인터넷 모듈 및 무선 인터넷 모듈 중 적어도 하나를 포함할 수 있다.The communication unit (130) may include one or more modules that enable communication between a computing device (100) and a communication system, between a computing device (100) and a user terminal (200), between a computing device (100) and a server (300), or between a computing device (100) and a network (N). The communication unit (130) may include at least one of a mobile communication module, a wired Internet module, and a wireless Internet module.

사용자 단말(200)은, 사용자가 소유하고 있는 PC(personal computer), 노트북(note book), 모바일 단말기(mobile terminal), 스마트 폰(smart phone), 태블릿 PC(tablet pc) 등을 포함할 수 있으며, 유/무선 네트워크에 접속할 수 있는 모든 종류의 단말을 포함할 수 있다.The user terminal (200) may include a personal computer (PC), notebook, mobile terminal, smart phone, tablet PC, etc. owned by the user, and may include all types of terminals that can connect to wired/wireless networks.

본 개시에서, 사용자 단말(200)은 사용자로부터 음성을 입력받을 수 있다. 사용자 단말(200)은 화자 검증 수행 정보, 의도 추론 수행 정보를 컴퓨팅 장치(100)에 전송함으로써 제조설비(400)로 명령을 전달할 수 있다. 결과가 수신됨에 따라, 사용자 단말(200)은 수신된 결과를 디스플레이할 수도 있다. 한편, 상기 사용자 단말(200)은 관리자 단말(210) 내지는 작업자 단말(220)을 포함하도록 구성될 수도 있다. In the present disclosure, the user terminal (200) can receive a voice input from a user. The user terminal (200) can transmit a command to the manufacturing facility (400) by transmitting speaker verification performance information and intention inference performance information to the computing device (100). As the result is received, the user terminal (200) can also display the received result. Meanwhile, the user terminal (200) may be configured to include an administrator terminal (210) or an operator terminal (220).

본 개시의 몇몇 실시예에 따르면, 컴퓨팅 장치(100) 및 사용자 단말(200)은 하나의 구성으로 구현될 수도 있다.According to some embodiments of the present disclosure, the computing device (100) and the user terminal (200) may be implemented in a single configuration.

서버(300)는, 예를 들어, 마이크로프로세서, 메인프레임 컴퓨터, 디지털 프로세서, 휴대용 디바이스 및 디바이스 제어기 등과 같은 임의의 타입의 컴퓨터 시스템 또는 컴퓨터 디바이스를 포함할 수 있다.The server (300) may include any type of computer system or computer device, such as, for example, a microprocessor, a mainframe computer, a digital processor, a handheld device, and a device controller.

본 개시의 몇몇 실시예에 따르면, 서버(300)는 화자 인식 모델, 의도 추론 모델 등을 저장하고 있을 수 있다. 일례로, 화자 인식 모델, 의도 추론 모델은 동적으로 확장될 수 있다. 여기서, 동적으로 확장된다는 의미는 관리자의 개입이 없이도 자동으로 생성/추가 된다는 의미일 수 있다. 예를 들어, 화자 인식 모델, 의도 추론 모델은 데이터 화자 검증, 의도 추론을 위한 요청 신호에 기초하여 추가로 생성될 수도 있다. 추가로 생성된 화자 인식 모델, 의도 추론 모델은 화자 검증 요청,의도 추론 요청 정보를 서버(300)로 전송할 수 있다. 따라서, 서버(300)는 모든 화자 인식 모델, 의도 추론 모델 각각에 대한 연결 정보를 저장하고 있을 수 있다. According to some embodiments of the present disclosure, the server (300) may store speaker recognition models, intent inference models, etc. For example, the speaker recognition model and the intent inference model may be dynamically expanded. Here, dynamically expanded may mean that they are automatically generated/added without intervention of an administrator. For example, the speaker recognition model and the intent inference model may be additionally generated based on a request signal for data speaker verification and intent inference. The additionally generated speaker recognition model and intent inference model may transmit speaker verification request and intent inference request information to the server (300). Accordingly, the server (300) may store connection information for each of all speaker recognition models and intent inference models.

본 개시의 몇몇 실시예에 따르면, 컴퓨팅 장치(100) 및 서버(300)는 하나의 개체로서 구현될 수도 있다. 이에 따라, 컴퓨팅 장치(100), 서버(300) 및 화자 인식 모델, 의도 추론 모델은 하나의 구성으로서 구현될 수도 있다. 예를 들어, 컴퓨팅 장치(100)는 서버(300)에 포함되어 하나의 구성으로 동작할 수도 있다. 다른 예시로, 컴퓨팅 장치(100) 및 화자 인식 모델, 의도 추론 모델은 서버(300)에 포함되어 하나의 구성으로 동작할 수 있다. According to some embodiments of the present disclosure, the computing device (100) and the server (300) may be implemented as a single entity. Accordingly, the computing device (100), the server (300), and the speaker recognition model and the intent inference model may be implemented as a single configuration. For example, the computing device (100) may be included in the server (300) and operate as a single configuration. As another example, the computing device (100) and the speaker recognition model and the intent inference model may be included in the server (300) and operate as a single configuration.

상기 제조설비(400)는 컴퓨팅 장치(100), 사용자 단말(200) 내지는 서버(300)와 네트워크(N)로 연결되며, 컴퓨팅 장치(100), 사용자 단말(200) 내지는 서버(300)에서 전달되는 명령에 따라, 동작을 수행한다. 상기 제조설비(400)는 명령에 따른 동작을 수행하기 위하여 컴퓨팅 장치(100)에 대응되는 동일 내지 유사한 구성들이 포함될 수도 있을 것이다. 다만, 상기 제조설비(400)는 컴퓨팅 장치(100)의 모든 구성을 포함하지는 않으며, 제어명령에 따라 동작을 수행할 수 있을 만큼의 구성을 갖는 것이 적절할 것이다. The above manufacturing facility (400) is connected to a computing device (100), a user terminal (200) or a server (300) through a network (N), and performs operations according to commands transmitted from the computing device (100), the user terminal (200) or the server (300). The above manufacturing facility (400) may include identical or similar components corresponding to the computing device (100) in order to perform operations according to the commands. However, the above manufacturing facility (400) does not include all components of the computing device (100), and it would be appropriate to have components sufficient to perform operations according to control commands.

네트워크(N)는 유선 및 무선 등과 같은 그 통신 양태를 가리지 않고 구성될 수 있으며, 단거리 통신망(PAN:Personal Area Network), 근거리 통신망(WAN:Wide Area Network) 등 다양한 통신망으로 구성될 수 있다. 또한, 상기 네트워크는 공지의 월드와이드웹(WWW:World Wide Web)일 수 있으며, 적외선(IrDA:Infrared Data Association) 또는 블루투스(Bluetooth)와 같이 단거리 통신에 이용되는 무선 전송 기술을 이용할 수도 있다. 본 명세서에서 설명된 기술들은 위에서 언급된 네트워크들뿐만 아니라, 다른 네트워크들에서도 사용될 수 있다.The network (N) can be configured regardless of its communication mode, such as wired or wireless, and can be configured with various communication networks, such as a personal area network (PAN) and a wide area network (WAN). In addition, the network can be the well-known World Wide Web (WWW), and can also use a wireless transmission technology used for short-distance communication, such as infrared (IrDA: Infrared Data Association) or Bluetooth. The technologies described in this specification can be used not only in the networks mentioned above, but also in other networks.

본 발명에 따른 음성 인식을 통한 제조 설비 제어 방법은, 도 2에 도시된 바와 같이, 입력음성에 대해 등록된 화자인지의 여부를 검증하고, 입력 음성에 대해 STT를 수행하여 텍스트로 변환하는 화자인식 및 STT 수행단계(S10); 변환된 텍스트의 의도를 추론하는 의도추론 수행 단계(S20); 및 연동된 설비로 명령을 전달하는 명령전달 단계(S30)를 포함한다. The manufacturing equipment control method using voice recognition according to the present invention includes a speaker recognition and STT performance step (S10) of verifying whether an input voice is a registered speaker and performing STT on the input voice to convert it into text, as illustrated in FIG. 2; an intent inference performance step (S20) of inferring the intent of the converted text; and a command transmission step (S30) of transmitting a command to a linked equipment.

상기 단계는 컴퓨팅 장치(100) 사용자 단말(200), 서버(300) 내지는 제조설비(400)를 통해 수행되는 것일 수 있다. 상기 과정에서 수행되는 과정 중 화자인식, 의도추론, 명령 전달 등 일부 과정은 신경망(CNN: convolutional neural network), 리커런트 신경망(RNN: recurrent neural network), 오토 인코더(auto encoder), GAN(Generative Adversarial Networks), 제한 볼츠만 머신(RBM: restricted boltzmann machine), 심층 신뢰 네트워크(DBN: deep belief network), Q 네트워크, U 네트워크, 샴 네트워크, 적대적 생성 네트워크(GAN: Generative Adversarial Network) 등의 네트워크 모델을 응용하여 기학습된 모델을 통해 수행될 수도 있을 것이다. The above step may be performed through a computing device (100), a user terminal (200), a server (300), or a manufacturing facility (400). Some of the processes performed in the above process, such as speaker recognition, intention inference, and command transmission, may be performed through a pre-learned model by applying a network model such as a convolutional neural network (CNN), a recurrent neural network (RNN), an auto encoder, a generative adversarial network (GAN), a restricted boltzmann machine (RBM), a deep belief network (DBN), a Q network, a U network, a Siamese network, and a generative adversarial network (GAN).

상기 화자인식 및 STT 수행단계(S10)에서는, 도 4에 도시된 예시와 같이, 입력음성에 대해 등록된 화자인지의 여부를 검증하고, 입력 음성에 대해 STT를 수행하여 텍스트로 변환하는 일련의 과정을 수행한다. 이에, 상기 화자인식 및 STT 수행단계(S10)는, 인식대상화자의 음원을 입력하는 과정, 인식대상화자음원의 특징을 추출하고 저장하는 과정, 사용자의 음성을 입력받는 과정, 입력된 사용자의 음성을 분산처리하는 과정, 입력된 사용자의 음성 특징을 추출하는 과정, 인식대상화자음원 및 입력된 사용자음성 간 유사도를 산출하는 과정 및, STT를 수행하는 과정을 포함할 수 있다. In the speaker recognition and STT performance step (S10), as in the example illustrated in Fig. 4, a series of processes are performed to verify whether the input voice is a registered speaker and to perform STT on the input voice to convert it into text. Accordingly, the speaker recognition and STT performance step (S10) may include a process of inputting a sound source of a target speaker to be recognized, a process of extracting and storing features of the target speaker sound source, a process of inputting a user's voice, a process of distributedly processing the input user's voice, a process of extracting features of the input user's voice, a process of calculating a similarity between the target speaker sound source to be recognized and the input user's voice, and a process of performing STT.

인식대상화자의 음원을 입력하는 과정은 설비의 제어권한을 가진 사용자, 즉 인식대상화자의 음원을 입력받아, 화자 인식을 수행하기 위한 데이터베이스를 구축하는 일련의 과정을 포함한다. 인식대상화자의 음원을 입력하는 과정에서의 인식대상화자의 음원은 복수로 입력되는 것이 바람직하다. 이는, 인식대상화자의 음원을 복수로 입력 받음으로써, 추후, 화자 검증의 정확도를 향상시키기 위함이다. 인식대상화자의 음원을 입력하는 과정에서 입력받는 음원은 wav, mp4, mp3, Mpc, msv, ogg, AAC, FLAC, ALAC, AIFF, DSD 등의 형식일 수 있으며, wav 형식인 것이 바람직할 것이다. 인식대상화자의 음원을 입력하는 과정에서는 인식대상화자에 대하여 각 설비에 대한 제어의 권한을 상이하게 설정하여 입력할 수도 있을 것이다. 한편, 인식대상화자음원의 특징을 추출하고 저장하는 과정은 입력된 인식대상화자의 음원으로부터 추출한 소리의 고유한 특징을 나타내는 수치(feature)를 이용해 특징을 추출하여 저장하는 일련의 과정을 칭한다. 인식대상화자음원의 특징을 추출하고 저장하는 과정의 수치(feature)는 노이즈를 포함하는 불필요한 정보는 버리고 주파수를 포함하는 중요한 특질만 남긴 수치(feature)를 칭한다. 인식대상화자음원의 특징을 추출하고 저장하는 과정에서 특징을 추출하고 저장하는 수치(feature)와 관련하여, 입력된 인식대상화자의 음원을 25ms 내외의 짧은 구간으로 나누며, 잘게 쪼개진 음성을　프레임(frame)이라고 칭할 수 있다. 프레임 각각에　푸리에 변환(Fourier Transform)을 실시해 해당 구간 음성(frame)에 담긴 주파수(frequency) 정보를 추출한다. 한편, 모든 프레임 각각에 푸리에 변환을 실시한 결과를　스펙트럼(spectrum)이라고 할 수도 있을 것다. 상기와 같은 스펙트럼에 사람의 말소리 인식에 민감한 주파수 영역대는 세밀하게 보고 나머지 영역대는 상대적으로 덜 촘촘히 분석하는 필터(Mel Filter Bank)를 적용할 수도 있다. 이를　멜 스펙트럼(Mel Spectrum)이라고 지칭한다. 여기에 로그를 취한 것을 로그 멜 스펙트럼(log-Mel Spectrum)이라고 지칭한다. 본 발명의 인식대상화자음원 특징 추출 및 저장 단계(S12)에 따른 수치(feature)는 로그 멜 스펙트럼에 역푸리에변환(Inverse Fourier Transform)을 적용해 주파수 도메인의 정보를 새로운 시간(time) 도메인으로 바꾼 것일 수도 있다. 인식대상화자음원의 특징을 추출하고 저장하는 과정에서는 각 인식대상화자에 대하여 각 설비에 대한 제어의 권한을 상이하게 설정하여 입력 저장할 수도 있을 것이다. 한편, 사용자의 음성을 입력받는 과정은 사용자의 음성을 입력받는 일련의 과정을 지칭한다. 사용자의 음성을 입력받는 과정에서 입력받는 사용자의 음성은, 현재 대규모 제조 설비를 제어하고자 하는 사용자의 음성으로, 인식대상화자의 음원을 입력하는 과정에서 음원을 입력한 인식대상화자일 수도 있으며, 다른 사용자일 수도 있다. 사용자의 음성을 입력받는 과정에서 입력받은 음성이 인식대상화자의 음원을 입력하는 과정에서 음원을 입력한 인식대상화자인지, 다른 사용자인지의 여부는 후술할 과정에서 검증될 것이다. 사용자의 음성을 입력받는 과정에서 입력받는 사용자의 음성은 wav, mp4, mp3, Mpc, msv, ogg, AAC, FLAC, ALAC, AIFF, DSD 등의 형식일 수 있으며, wav 형식인 것이 바람직할 것이다. 한편, 입력된 사용자의 음성을 분산처리하는 과정은, 상기 사용자의 음성을 입력받는 과정에서, 화자 검증을 위해 동시에 입력되는 음성을 분산 처리하여, 화자 검증에 과부하가 발생하거나, 오류가 발생하는 것을 미연에 방지한다. 이는 대규모 제조 설비의 제어에 따라 여러 사용자로부터 동시에 전달되는 음성 처리를 위해 일정 수의 요청까진 병렬 처리를 수행하고 그 이상의 요청은 대기시킨 뒤 처리하는 세마포어 개념을 적용하여 병목현상을 최소화하는 것이다. 상기 입력된 사용자의 음성을 분산처리하는 과정은 멀티프로그래밍 환경에서 공유 자원에 대한 접근을 제한하는 방법인 세마포어(Semaphore)를 통해 수행한다. 이는, 상호 배제 알고리즘(Mutual Exclusion)에 기반한다. 예를 들어, 두 개의 atomic 함수로 제어되는 정수 변수로 멀티프로그래밍 환경에서 공유자원에 대한 접근 제어를 하는 방식으로 n개의 공유되는 자원에 제한된 개수의 프로세스, 또는 스레드만 접근할 수 있도록 한다. 이때의 세마포어의 카운트는 1 이상이며 카운트를 조절하여 진입 가능한 프로세스/스레드 수를 조절할 수 있다. 이에, 입력된 사용자의 음성을 분산처리하는 과정은 입력된 사용자의 음성을 병렬처리하는 과정 또는 입력된 사용자의 음성을 순차처리하는 과정 중 어느 하나 이상을 포함할 수 있다. 상기 사용자의 음성을 병렬처리하는 과정은, 기설정된 수 이하의 입력음성은 병렬로 처리할 수 있다. 상기 단계에서는 일정한 수의 입력음성을 동시에 처리한다. 이때, 상기 사용자의 음성을 병렬처리하는 과정에서는 임계구역을 설정할 수도 있으며, 한계가 설정되어 동시에 처리하는 만큼의 구간을 임계구역으로 칭할 수도 있을 것이다. 예를 들어, 입력된 사용자의 음성을 분산처리하는 과정에서 설정한 처리 숫자가 '5'일 경우, '5'개의 입력음성을 동시에 처리하도록 하여 추후의 화자인식 속도를 원할하게 한다. 입력된 사용자의 음성을 순차처리하는 과정은, 기설정된 수를 초과하여 신규하게 전달되는 입력음성은 대기상태로 적용하여 병목 현상을 방지할 수 있다. 기설정된 수를 초과하여 신규하게 전달되는 입력음성은 빈 반복문으로 계속 돌다가 임계구역에 진입 가능한 경우 임계구역 내로 진입하는 방식을 적용할 수 있을 것이다. 예를 들어, 입력된 사용자의 음성을 분산처리하는 과정에서 설정한 처리 숫자가 '5'일 경우, '5'개의 입력음성을 동시에 처리하도록 하며, 6번째 입력음성부터는 대기상태로 적용하여, 병목 현상을 방지하도록 한다. 한편, 입력된 사용자의 음성 특징을 추출하는 과정은, 상기 사용자의 음성을 입력받는 과정에서 입력받은 사용자의 음성을 추출한 소리의 고유한 특징을 나타내는 수치(feature)를 이용해 특징을 추출하여 저장하는 일련의 과정을 포함한다. 수치(feature)는 노이즈를 포함하는 불필요한 정보는 버리고 주파수를 포함하는 중요한 특질만 남긴 수치(feature)를 칭한다. 입력된 사용자의 음성 특징을 추출하는 과정에서 특징을 추출하고 저장하는 수치(feature)와 관련하여, 입력된 인식대상화자의 음원을 25ms 내외의 짧은 구간으로 나누며, 잘게 쪼개진 음성을　프레임(frame)이라고 칭할 수 있다. 프레임 각각에　푸리에 변환(Fourier Transform)을 실시해 해당 구간 음성(frame)에 담긴 주파수(frequency) 정보를 추출한다. 한편, 모든 프레임 각각에 푸리에 변환을 실시한 결과를　스펙트럼(spectrum)이라고 할 수도 있을 것다. 상기와 같은 스펙트럼에 사람의 말소리 인식에 민감한 주파수 영역대는 세밀하게 보고 나머지 영역대는 상대적으로 덜 촘촘히 분석하는 필터(Mel Filter Bank)를 적용할 수도 있다. 이를　멜 스펙트럼(Mel Spectrum)이라고 지칭한다. 여기에 로그를 취한 것을 로그 멜 스펙트럼(log-Mel Spectrum)이라고 지칭한다. 본 발명의 인식대상화자음원 특징 추출 및 저장 단계(S12)에 따른 수치(feature)는 로그 멜 스펙트럼에 역푸리에변환(Inverse Fourier Transform)을 적용해 주파수 도메인의 정보를 새로운 시간(time) 도메인으로 바꾼 것일 수도 있다. 한편, 인식대상화자음원 및 입력된 사용자음성 간 유사도를 산출하는 과정은, 인식대상화자음원 및 입력음성 간 유사도를 산출하여, 사용자의 음성을 입력받는 과정에서 음성을 입력한 사용자가 설비제어 권한을 가진 사용자인지 여부를 확인하는 일련의 인식대상화자음원 및 입력된 사용자음성 간 유사도를 산출하는 과정에서는 인식대상화자음원의 수치(feature)에 대한 N개 정규 분포(Normal Distribution)의 가중합 모델과, 입력 음성의 수치(feature)에 대한 N개 정규 분포(Normal Distribution)의 가중합 모델을 상호 비교하여 유사도를 산출하는 것일 수 있다. 한편, STT를 수행하는 과정에서는, 사용자의 음성을 입력받는 과정에서 입력되어, 입력된 사용자의 음성을 분산처리하는 과정, 입력된 사용자의 음성 특징을 추출하는 과정, 인식대상화자음원 및 입력된 사용자음성 간 유사도를 산출하는 과정을 거친 입력음성을 텍스트로 변환하여, 변환된 텍스트를 의도추론 수행 단계(S20)로 넘기는 일련의 과정을 포함한다. STT를 수행하는 과정에서 입력음성을 텍스트로 변환하는 데에는, 공지된 STT 모델을 적용하여 수행될 수 있을 것이다. STT(Speech-to-Text)는 사람이 말하는 음성 언어를 컴퓨터가 해석해 그 내용을 문자데이터로 전환하는 처리를 말한다. STT를 수행하는 과정에서 변환하는 음성은 한국어 외에도 다양한 외국어를 포함할 수도 있을 것이다. The process of inputting the audio source of the target speaker includes a series of processes of receiving the audio source of the target speaker, that is, the user with the control authority of the equipment, and constructing a database for performing speaker recognition. It is preferable that the audio sources of the target speaker be input in the process of inputting the audio source of the target speaker. This is to improve the accuracy of speaker verification later by receiving the audio sources of the target speaker in multiple inputs. The audio sources received in the process of inputting the audio source of the target speaker may be in the formats of wav, mp4, mp3, Mpc, msv, ogg, AAC, FLAC, ALAC, AIFF, DSD, etc., and it is preferable that the format be wav. In the process of inputting the audio source of the target speaker, the control authority for each equipment may be set differently for the target speaker and input. Meanwhile, the process of extracting and storing the features of the target speaker audio source refers to a series of processes of extracting and storing features using a numerical value (feature) representing the unique features of the sound extracted from the input audio source of the target speaker. The feature of the process of extracting and storing the features of the target speaker sound source refers to the feature that discards unnecessary information including noise and leaves only the important characteristics including frequency. In relation to the feature that extracts and stores the features of the target speaker sound source in the process of extracting and storing the features, the input target speaker sound source is divided into short sections of about 25ms, and the finely divided voice can be called a frame. The Fourier transform is performed on each frame to extract the frequency information contained in the corresponding section of the voice (frame). Meanwhile, the result of performing the Fourier transform on each frame can be called a spectrum. In the above spectrum, a filter (Mel Filter Bank) that closely examines the frequency range that is sensitive to human speech recognition and analyzes the remaining range relatively less densely can be applied. This is called the Mel spectrum. Here, the log taken is called the log-Mel spectrum. The numerical value (feature) according to the step (S12) of extracting and storing the features of the target consonant sound source of the present invention may be the information of the frequency domain changed into a new time domain by applying the inverse Fourier transform to the log-Mel spectrum. In the process of extracting and storing the features of the target consonant sound source of the recognition, the control authority for each facility may be set differently for each target consonant and input and stored. Meanwhile, the process of receiving the user's voice refers to a series of processes of receiving the user's voice. The user's voice received in the process of receiving the user's voice is the voice of the user who wants to control the current large-scale manufacturing facility, and may be the target consonant who input the sound source in the process of inputting the sound source of the target consonant or another user. Whether the voice received in the process of receiving the user's voice is the target consonant who input the sound source in the process of inputting the sound source of the target consonant or another user will be verified in the process described below. In the process of receiving a user's voice, the input user's voice can be in the formats of wav, mp4, mp3, Mpc, msv, ogg, AAC, FLAC, ALAC, AIFF, DSD, etc., and it is preferable that it be in the wav format. Meanwhile, in the process of distributing and processing the input user's voice, in the process of receiving the user's voice, the voices input simultaneously for speaker verification are distributedly processed to prevent overload or errors from occurring in speaker verification. This is to minimize bottlenecks by applying the semaphore concept, which performs parallel processing up to a certain number of requests and waits for and then processes requests exceeding that number in order to process voices transmitted simultaneously from multiple users under the control of large-scale manufacturing facilities. The process of distributing and processing the input user's voice is performed using a semaphore, which is a method of restricting access to shared resources in a multiprogramming environment. This is based on a mutual exclusion algorithm. For example, in a multiprogramming environment, access to shared resources is controlled by an integer variable controlled by two atomic functions, so that only a limited number of processes or threads can access n shared resources. At this time, the count of the semaphore is 1 or more, and the number of processes/threads that can enter can be controlled by adjusting the count. Accordingly, the process of distributing the input user's voice may include at least one of the process of parallel processing the input user's voice or the process of sequentially processing the input user's voice. The process of parallel processing the user's voice may process input voices less than a preset number in parallel. In the above step, a certain number of input voices are processed simultaneously. At this time, a critical section may be set in the process of parallel processing the user's voice, and the section that is processed simultaneously as much as the limit is set may be called a critical section. For example, if the processing number set in the process of distributing the input user's voice is '5', '5' input voices are processed simultaneously to facilitate the speaker recognition speed in the future. The process of sequentially processing the input user's voice can prevent a bottleneck phenomenon by applying a waiting state to newly transmitted input voices exceeding a preset number. The input voices exceeding a preset number can be applied to continuously loop in an empty loop and enter a critical zone if possible. For example, if the processing number set in the process of distributedly processing the input user's voice is '5', '5' input voices are processed simultaneously and a waiting state is applied from the 6th input voice to prevent a bottleneck phenomenon. Meanwhile, the process of extracting the input user's voice features includes a series of processes of extracting and storing features using a numerical value (feature) representing the unique characteristics of the sound extracted from the input user's voice in the process of inputting the user's voice. The numerical value (feature) refers to a numerical value (feature) that discards unnecessary information including noise and leaves only important characteristics including frequency. In the process of extracting the voice features of the input user, the input sound source of the target speaker is divided into short sections of about 25 ms, and the finely divided voice can be called a frame. A Fourier transform is performed on each frame to extract the frequency information contained in the corresponding section of voice (frame). Meanwhile, the result of performing a Fourier transform on each frame can be called a spectrum. A filter (Mel Filter Bank) that examines the frequency range that is sensitive to human speech recognition in detail and analyzes the remaining ranges relatively less densely can be applied to the above spectrum. This is called a Mel Spectrum. The logarithm of this is called a log-Mel Spectrum. The feature according to the step (S12) of extracting and storing features of the target consonant sound source of the present invention may be information in the frequency domain converted into a new time domain by applying an inverse Fourier transform to the log-mel spectrum. Meanwhile, the process of calculating the similarity between the target consonant sound source of recognition and the input user voice may calculate the similarity between the target consonant sound source of recognition and the input voice, thereby determining whether the user who input the voice in the process of receiving the user's voice is a user with facility control authority. In the process of calculating the similarity between the target consonant sound source of recognition and the input user voice, the similarity may be calculated by comparing a weighted sum model of N normal distributions of the features of the target consonant sound source of recognition with a weighted sum model of N normal distributions of the features of the input voice. Meanwhile, the process of performing STT includes a series of processes including converting the input voice into text through the process of receiving the user's voice, the process of distributing and processing the input user's voice, the process of extracting the features of the input user's voice, and the process of calculating the similarity between the target consonant sound source and the input user's voice, and passing the converted text to the intention inference execution step (S20). Converting the input voice into text in the process of performing STT can be performed by applying a known STT model. STT (Speech-to-Text) refers to the process of a computer interpreting a spoken language spoken by a person and converting the content into text data. The voice converted in the process of performing STT may include various foreign languages in addition to Korean.

상기 의도추론 수행 단계(S20)에서는, 도 5에 도시된 바와 같이, 상기 화자인식 및 STT 수행단계(S10)에서 변환된 텍스트의 의도를 추론하는 일련의 과정을 수행한다. 의도추론은 패턴, 형태소 분석, 머신러닝, ChatGPT 등 외부 연동 등의 방법을 이용할 수 있으며 이 중 적용 여부 및 우선순위 지정을 할 수 있다. In the above intention inference execution step (S20), as illustrated in Fig. 5, a series of processes for inferring the intention of the text converted in the speaker recognition and STT execution step (S10) are performed. Intention inference can utilize methods such as patterns, morphological analysis, machine learning, and external linkage such as ChatGPT, and among these, whether to apply and the priority can be specified.

일 실시에 따르면, 상기 의도추론 수행 단계(S20)에서는 서버에 API 연결 정보를 요청하는 과정, 상기 서버로부터 상기 API 연결 정보를 수신한 경우, 상기 API 연결 정보에 기초하여 결정된 API를 호출하여 상기 의도를 추론하는 과정 그리고, 상기 의도를 추론한 결과를 상기 명령전달 단계(S30)로 넘기는 과정을 포함할 수도 있을 것이다. According to one embodiment, the intention inference execution step (S20) may include a process of requesting API connection information from a server, a process of inferring the intention by calling an API determined based on the API connection information when the API connection information is received from the server, and a process of passing the result of inferring the intention to the command transmission step (S30).

의도를 추론하는 과정은 의도추론 모델을 적용하여 진행될 수도 있으며, 의도 추론 모델은 컨볼루션 신경망(CNN: convolutional neural network), 리커런트 신경망(RNN: recurrent neural network), 오토 인코더(auto encoder), GAN(Generative Adversarial Networks), 제한 볼츠만 머신(RBM: restricted boltzmann machine), 심층 신뢰 네트워크(DBN: deep belief network), Q 네트워크, U 네트워크, 샴 네트워크, 적대적 생성 네트워크(GAN: Generative Adversarial Network) 등의 네트워크 모델일 수 있다. The process of inferring intent can be performed by applying an intent inference model, and the intent inference model can be a network model such as a convolutional neural network (CNN), a recurrent neural network (RNN), an auto encoder, a generative adversarial network (GAN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a Q network, a U network, a Siamese network, or a generative adversarial network (GAN).

상기 명령전달 단계(S30)에서는 연동된 설비로 명령을 전달하는 일련의 과정을 수행한다. 이에, 상기 명령전달 단계(S30)는, 명령의도 및 제어권한 확인단계(S31), 제어명령 분산 처리 단계(S32), 제어명령 송출 단계(S33)를 포함할 수 있다. In the above command transmission step (S30), a series of processes for transmitting commands to linked equipment are performed. Accordingly, the above command transmission step (S30) may include a command intent and control authority confirmation step (S31), a control command distribution processing step (S32), and a control command transmission step (S33).

상기 명령의도 및 제어권한 확인단계(S31)에서는 의도추론 결과, 사용자가 설비를 제어하고자 하는 명령 의도임과 동시에, 제어 권한을 갖는지를 확인하는 과정을 수행한다. 상기 명령의도 및 제어권한 확인단계(S31)에서의 명령의도에는, a:대상 설비, b:설비를 수행시키고자 하는 행동, c:수행 명령을 적용할 대상 또는 d:옮기거나 적용시킬 목표 중 어느 하나 이상을 포함될 수 있다. 예를 들어, 상기 명령의도 및 제어권한 확인단계(S31)에서의 명령의도는 ['장비1'을 '정지’해], ['로봇팔1'로 '물체1'을 '집어올려'], ['로봇팔1'을 '우측 90도' '회전], ['로봇팔1'로 '물체 A'를 '장소 B'에 '내려놓아라'] 등의 명령일 수 있으며, 이때, '로봇팔1'은 대상 설비, '물체 A'는 수행 명령을 적용할 대상, '장소 B'는 옮기거나 적용시킬 목표, '내려놓아라'는 설비를 수행시키고자 하는 행동이 될 수 있다. 한편, 상기 명령의도 및 제어권한 확인단계(S31)에서는 사용자가 명령 의도를 갖는 지의 여부와, 사용자가 해당 설비의 해당 부위의 작동에 대한 제어 권한을 갖는 지의 여부가 동시에 충족되는 지의 여부를 판독하여, 사용자가 명령 의도를 갖고있으며, 사용자가 제어 권한을 갖고 있는, 양 조건이 동시에 충족되는 경우에만, 후술할 제어명령 분산 처리 단계(S32)를 수행하도록 한다. In the above command intent and control authority confirmation step (S31), a process is performed to confirm whether the user's command intent is to control the equipment and whether the user has control authority based on the intent inference result. The command intent in the above command intent and control authority confirmation step (S31) may include at least one of a: target equipment, b: action to be performed on the equipment, c: target to which the execution command is to be applied, or d: target to be moved or applied. For example, the command intent in the command intent and control authority confirmation step (S31) may be commands such as ['stop' 'equipment 1'], ['pick up' 'object 1' with 'robot arm 1'], ['rotate' 'robot arm 1' '90 degrees to the right'], and ['put down' 'object A' at 'location B' with 'robot arm 1']. In this case, 'robot arm 1' may be the target facility, 'object A' may be the target to which the execution command is to be applied, 'location B' may be the target to be moved or applied, and 'put down' may be an action to perform the facility. Meanwhile, in the command intent and control authority confirmation step (S31), whether the user has the command intent and whether the user has the control authority for the operation of the corresponding part of the corresponding equipment are simultaneously satisfied is determined, and only when both conditions of the user having the command intent and the user having the control authority are simultaneously satisfied, the control command distribution processing step (S32) described later is performed.

상기 제어명령 분산 처리 단계(S32)에서는 제어 명령을 분산 처리하여, 제조 설비에 과부하가 발생하거나, 오류가 발생하는 것을 미연에 방지한다. 이에, 상기 제어명령 분산 처리 단계(S32)는, 제어명령 병렬처리 단계(S321), 제어명령 순차처리 단계(S322) 중 어느 하나 이상을 포함할 수 있다. In the above control command distribution processing step (S32), control commands are distributed and processed to prevent overload or errors in manufacturing equipment from occurring. Accordingly, the control command distribution processing step (S32) may include at least one of the control command parallel processing step (S321) and the control command sequential processing step (S322).

상기 제어명령 병렬처리 단계(S321)에서는 기설정된 수 이하의 제어명령은 병렬로 처리하여 제조설비의 동작 지연 발생을 방지할 수 있다. 예를 들어, 제어명령 분산 처리 단계(S32)에서 설정한 처리 숫자가 '5'일 경우, '5'개의 제어 명령을 동시에 처리하도록 하여 제조 설비에 하달되는 제어명령을 원활히 전달할 수 있도록 한다. In the above control command parallel processing step (S321), control commands less than a preset number are processed in parallel to prevent delays in the operation of the manufacturing equipment. For example, if the processing number set in the control command distributed processing step (S32) is '5', '5' control commands are processed simultaneously to ensure smooth transmission of the control commands to the manufacturing equipment.

상기 제어명령 순차처리 단계(S322)에서는, 하나의 설비에 중복 또는 상충되는 명령이 동시에 전달되는 경우의 오작동을 방지하기 위하여, 기설정된 수를 초과하여 신규하게 전달되는 명령은 대기상태로 적용하여 제조설비의 오작동을 방지할 수 있다. 예를 들어, 제어명령 순차처리 단계(S322)에서 설정한 처리 숫자가 '5'일 경우, '5'개의 제어 명령을 동시에 처리하도록 하며, 6번째 제어 명령부터는 대기상태로 적용하여, 병목 현상을 방지하도록 한다. 상기 제어명령 순차처리 단계(S322)를 수행함에 있어, 처리 한도를 공지된 방식의 모델 내지는 기학습된 모델로 제한할 수 있을 것이다. In the above control command sequential processing step (S322), in order to prevent malfunction when duplicate or conflicting commands are simultaneously transmitted to one facility, commands newly transmitted in excess of a preset number are applied in a standby state to prevent malfunction of the manufacturing facility. For example, when the processing number set in the control command sequential processing step (S322) is '5', '5' control commands are processed simultaneously, and from the 6th control command, they are applied in a standby state to prevent a bottleneck phenomenon. In performing the above control command sequential processing step (S322), the processing limit may be limited to a known model or a pre-learned model.

한편, 상기 제어명령 순차처리 단계(S322)를 수행함에 있어, 제어명령 순차처리 단계(S322)에서 현재 처리중인 제어명령을 제외한 대기 중인 제어명령에 대해서만, '멈춤' 명령을 통해 일괄 취소가 가능하도록 할 수 있다. 예를 들어, 제어명령 순차처리 단계(S322)에서 설정한 처리 숫자가 '5'일 경우, '5'개의 제어 명령을 동시에 처리하되, 대기 상태로 적용되어 있는 6번째 이상의 제어 명령들에 대해서만 '멈춰' 등의 명령을 통해 일괄적으로 취소하여, 오작동이 발생할 시의 설비의 제어를 용이하도록 한다. Meanwhile, when performing the control command sequential processing step (S322), only the control commands that are waiting, excluding the control commands that are currently being processed, can be canceled in bulk through a 'stop' command in the control command sequential processing step (S322). For example, when the processing number set in the control command sequential processing step (S322) is '5', '5' control commands are processed simultaneously, but only the 6th or more control commands that are applied in a waiting state are canceled in bulk through a command such as 'stop', thereby facilitating control of the equipment when a malfunction occurs.

상기 제어명령 송출단계(S33)에서는 명령의도 및 제어권한 확인단계(S31)에서 명령의도 및 제어권한이 확인되고, 제어명령 분산 처리 단계(S32)에서 분산처리된 제어명령을 제조설비로 송출하여, 제조설비가 동작하도록 한다. In the above control command transmission step (S33), the command intent and control authority are confirmed in the command intent and control authority confirmation step (S31), and in the control command distribution processing step (S32), the distributedly processed control command is transmitted to the manufacturing equipment to operate the manufacturing equipment.

도 6은 본 개시내용의 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 일반적인 개략도를 도시한다.FIG. 6 illustrates a general schematic diagram of an exemplary computing environment in which embodiments of the present disclosure may be implemented.

본 개시내용이 일반적으로 하나 이상의 컴퓨터 상에서 실행될 수 있는 컴퓨터 실행가능 명령어와 관련하여 전술되었지만, 당업자라면 본 개시내용 기타 프로그램 모듈들과 결합되어 및/또는 하드웨어와 소프트웨어의 조합으로서 구현될 수 있다는 것을 잘 알 것이다.Although the present disclosure has been generally described above in the context of computer-executable instructions that may be executed on one or more computers, those skilled in the art will appreciate that the present disclosure may also be implemented in combination with other program modules and/or as a combination of hardware and software.

일반적으로, 본 명세서에서의 모듈은 특정의 태스크를 수행하거나 특정의 추상 데이터 유형을 구현하는 루틴, 프로시져, 프로그램, 컴포넌트, 데이터 구조, 기타 등등을 포함한다. 또한, 당업자라면 본 개시의 방법이 단일-프로세서 또는 멀티프로세서 컴퓨터 시스템, 미니컴퓨터, 메인프레임 컴퓨터는 물론 퍼스널 컴퓨터, 핸드헬드 컴퓨팅 장치, 마이크로프로세서-기반 또는 프로그램가능 가전 제품, 기타 등등(이들 각각은 하나 이상의 연관된 장치와 연결되어 동작할 수 있음)을 비롯한 다른 컴퓨터 시스템 구성으로 실시될 수 있다는 것을 잘 알 것이다.In general, modules herein include routines, procedures, programs, components, data structures, and the like that perform particular tasks or implement particular abstract data types. Furthermore, those skilled in the art will appreciate that the methods of the present disclosure can be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, handheld computing devices, microprocessor-based or programmable consumer electronics, and the like, each of which may be operatively connected to one or more associated devices.

본 개시의 설명된 실시예들은 또한 어떤 태스크들이 통신 네트워크를 통해 연결되어 있는 원격 처리 장치들에 의해 수행되는 분산 컴퓨팅 환경에서 실시될 수 있다. 분산 컴퓨팅 환경에서, 프로그램 모듈은 로컬 및 원격 메모리 저장 장치 둘다에 위치할 수 있다.The described embodiments of the present disclosure can also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

컴퓨터는 통상적으로 다양한컴퓨터 판독가능 매체를 포함한다. 컴퓨터에 의해 액세스 가능한 매체로서, 휘발성 및 비휘발성 매체, 일시적(transitory) 및 비일시적(non-transitory) 매체, 이동식 및 비-이동식 매체를 포함한다. 제한이 아닌 예로서, 컴퓨터 판독가능 매체는 컴퓨터 판독가능 저장 매체 및 컴퓨터 판독가능 전송 매체를 포함할 수 있다. Computers typically include a variety of computer-readable media. Media that can be accessed by a computer include volatile and nonvolatile media, transitory and non-transitory media, removable and non-removable media. By way of example and not limitation, computer-readable media can include computer-readable storage media and computer-readable transmission media.

컴퓨터 판독가능 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보를 저장하는 임의의 방법 또는 기술로 구현되는 휘발성 및 비휘발성 매체, 일시적 및 비-일시적 매체, 이동식 및 비이동식 매체를 포함한다. 컴퓨터 판독가능 저장 매체는 RAM, ROM, EEPROM, 플래시 메모리 또는 기타 메모리 기술, CD-ROM, DVD(digital video disk) 또는 기타 광 디스크 저장 장치, 자기 카세트, 자기 테이프, 자기 디스크 저장 장치 또는 기타 자기 저장 장치, 또는 컴퓨터에 의해 액세스될 수 있고 원하는 정보를 저장하는 데 사용될 수 있는 임의의 기타 매체를 포함하지만, 이에 한정되지 않는다.Computer-readable storage media includes volatile and nonvolatile media, transitory and non-transitory media, removable and non-removable media implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules or other data. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROMs, digital video disks (DVDs) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be accessed by a computer and which can be used to store the desired information.

컴퓨터 판독가능 전송 매체는 통상적으로 반송파(carrier wave) 또는 기타 전송 메커니즘(transport mechanism)과 같은 피변조 데이터 신호(modulated data signal)에 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터등을 구현하고 모든 정보 전달 매체를 포함한다. 피변조 데이터 신호라는 용어는 신호 내에 정보를 인코딩하도록 그 신호의 특성들 중 하나 이상을 설정 또는 변경시킨 신호를 의미한다. 제한이 아닌 예로서, 컴퓨터 판독가능 전송 매체는 유선 네트워크 또는 직접 배선 접속(direct-wired connection)과 같은 유선 매체, 그리고 음향, RF, 적외선, 기타 무선 매체와 같은 무선 매체를 포함한다. 상술된 매체들 중 임의의 것의 조합도 역시 컴퓨터 판독가능 전송 매체의 범위 안에 포함되는 것으로 한다.Computer-readable transmission media typically includes any information delivery media that embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. The term modulated data signal means a signal that has one or more of its characteristics set or changed so as to encode information in the signal. By way of example, and not limitation, computer-readable transmission media includes wired media, such as a wired network or direct-wired connection, and wireless media, such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of computer-readable transmission media.

컴퓨터(1102)를 포함하는 본 개시의 여러가지 측면들을 구현하는 예시적인 환경(1100)이 나타내어져 있으며, 컴퓨터(1102)는 처리 장치(1104), 시스템 메모리(1106) 및 시스템 버스(1108)를 포함한다. 시스템 버스(1108)는 시스템 메모리(1106)(이에 한정되지 않음)를 비롯한 시스템 컴포넌트들을 처리 장치(1104)에 연결시킨다. 처리 장치(1104)는 다양한 상용 프로세서들 중 임의의 프로세서일 수 있다. 듀얼 프로세서 및 기타 멀티프로세서 아키텍처도 역시 처리 장치(1104)로서 이용될 수 있다.An exemplary environment (1100) implementing various aspects of the present disclosure is illustrated, including a computer (1102) including a processing unit (1104), a system memory (1106), and a system bus (1108). The system bus (1108) couples system components, including but not limited to the system memory (1106), to the processing unit (1104). The processing unit (1104) may be any of a variety of commercially available processors. Dual processors and other multiprocessor architectures may also be utilized as the processing unit (1104).

시스템 버스(1108)는 메모리 버스, 주변장치 버스, 및 다양한 상용 버스 아키텍처 중 임의의 것을 사용하는 로컬 버스에 추가적으로 상호 연결될 수 있는 몇 가지 유형의 버스 구조 중 임의의 것일 수 있다. 시스템 메모리(1106)는 판독 전용 메모리(ROM)(1110) 및 랜덤 액세스 메모리(RAM)(1112)를 포함한다. 기본 입/출력 시스템(BIOS)은 ROM, EPROM, EEPROM 등의 비휘발성 메모리(1110)에 저장되며, 이 BIOS는 시동 중과 같은 때에 컴퓨터(1102) 내의 구성요소들 간에 정보를 전송하는 일을 돕는 기본적인 루틴을 포함한다. RAM(1112)은 또한 데이터를 캐싱하기 위한 정적 RAM 등의 고속 RAM을 포함할 수 있다.The system bus (1108) may be any of several types of bus structures that may additionally be interconnected to a memory bus, a peripheral bus, and a local bus using any of a variety of commercial bus architectures. The system memory (1106) includes read-only memory (ROM) (1110) and random access memory (RAM) (1112). A basic input/output system (BIOS) is stored in nonvolatile memory (1110), such as ROM, EPROM, EEPROM, and the BIOS contains basic routines that help transfer information between components within the computer (1102), such as during start-up. The RAM (1112) may also include high-speed RAM, such as static RAM, for caching data.

컴퓨터(1102)는 또한 내장형 하드 디스크 드라이브(HDD)(1114)(예를 들어, EIDE, SATA)―이 내장형 하드 디스크 드라이브(1114)는 또한 적당한 섀시(도시 생략) 내에서 외장형 용도로 구성될 수 있음―, 자기 플로피 디스크 드라이브(FDD)(1116)(예를 들어, 이동식 디스켓(1118)으로부터 판독을 하거나 그에 기록을 하기 위한 것임), 및 광 디스크 드라이브(1120)(예를 들어, CD-ROM 디스크(1122)를 판독하거나 DVD 등의 기타 고용량 광 매체로부터 판독을 하거나 그에 기록을 하기 위한 것임)를 포함한다. 하드 디스크 드라이브(1114), 자기 디스크 드라이브(1116) 및 광 디스크 드라이브(1120)는 각각 하드 디스크 드라이브 인터페이스(1124), 자기 디스크 드라이브 인터페이스(1126) 및 광 드라이브 인터페이스(1128)에 의해 시스템 버스(1108)에 연결될 수 있다. 외장형 드라이브 구현을 위한 인터페이스(1124)는 예를 들어, USB(Universal Serial Bus) 및 IEEE 1394 인터페이스 기술 중 적어도 하나 또는 그 둘 다를 포함한다.The computer (1102) also includes an internal hard disk drive (HDD) (1114) (e.g., EIDE, SATA)—which may also be configured for external use within a suitable chassis (not shown), a magnetic floppy disk drive (FDD) (1116) (e.g., for reading from or writing to a removable diskette (1118)), and an optical disk drive (1120) (e.g., for reading from or writing to a CD-ROM disk (1122) or other high capacity optical media such as a DVD). The hard disk drive (1114), the magnetic disk drive (1116), and the optical disk drive (1120) may be connected to the system bus (1108) by a hard disk drive interface (1124), a magnetic disk drive interface (1126), and an optical drive interface (1128), respectively. An interface (1124) for implementing an external drive includes, for example, at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.

이들 드라이브 및 그와 연관된 컴퓨터 판독가능 매체는 데이터, 데이터 구조, 컴퓨터 실행가능 명령어, 기타 등등의 비휘발성 저장을 제공한다. 컴퓨터(1102)의 경우, 드라이브 및 매체는 임의의 데이터를 적당한 디지털 형식으로 저장하는 것에 대응한다. 상기에서의 컴퓨터 판독가능 저장 매체에 대한 설명이 HDD, 이동식 자기 디스크, 및 CD 또는 DVD 등의 이동식 광 매체를 언급하고 있지만, 당업자라면 집 드라이브(zip drive), 자기 카세트, 플래쉬 메모리 카드, 카트리지, 기타 등등의 컴퓨터에 의해 판독가능한 다른 유형의 저장 매체도 역시 예시적인 운영 환경에서 사용될 수 있으며 또 임의의 이러한 매체가 본 개시의 방법들을 수행하기 위한 컴퓨터 실행가능 명령어를 포함할 수 있다는 것을 잘 알 것이다.These drives and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and the like. In the case of a computer (1102), the drives and media correspond to storing any data in a suitable digital format. While the description of computer-readable storage media above has referred to HDDs, removable magnetic disks, and removable optical media such as CDs or DVDs, those skilled in the art will appreciate that other types of computer-readable storage media, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and that any such media may contain computer-executable instructions for performing the methods of the present disclosure.

운영 체제(1130), 하나 이상의 애플리케이션 프로그램(1132), 기타 프로그램 모듈(1134) 및 프로그램 데이터(1136)를 비롯한 다수의 프로그램 모듈이 드라이브 및 RAM(1112)에 저장될 수 있다. 운영 체제, 애플리케이션, 모듈 및/또는 데이터의 전부 또는 그 일부분이 또한 RAM(1112)에 캐싱될 수 있다. 본 개시가 여러가지 상업적으로 이용가능한 운영 체제 또는 운영 체제들의 조합에서 구현될 수 있다는 것을 잘 알 것이다.A number of program modules, including an operating system (1130), one or more application programs (1132), other program modules (1134), and program data (1136), may be stored in the drive and RAM (1112). All or portions of the operating system, applications, modules, and/or data may also be cached in RAM (1112). It will be appreciated that the present disclosure may be implemented in a variety of commercially available operating systems or combinations of operating systems.

사용자는 하나 이상의 유선/무선 입력 장치, 예를 들어, 키보드(1138) 및 마우스(1140) 등의 포인팅 장치를 통해 컴퓨터(1102)에 명령 및 정보를 입력할 수 있다. 기타 입력 장치(도시 생략)로는 마이크, IR 리모콘, 조이스틱, 게임 패드, 스타일러스 펜, 터치 스크린, 기타 등등이 있을 수 있다. 이들 및 기타 입력 장치가 종종 시스템 버스(1108)에 연결되어 있는 입력 장치 인터페이스(1142)를 통해 처리 장치(1104)에 연결되지만, 병렬 포트, IEEE 1394 직렬 포트, 게임 포트, USB 포트, IR 인터페이스, 기타 등등의 기타 인터페이스에 의해 연결될 수 있다.A user may enter commands and information into the computer (1102) via one or more wired/wireless input devices, such as a keyboard (1138) and a pointing device such as a mouse (1140). Other input devices (not shown) may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, a touch screen, and the like. These and other input devices are often connected to the processing unit (1104) via an input device interface (1142) that is coupled to the system bus (1108), but may be connected by other interfaces such as a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, and the like.

모니터(1144) 또는 다른 유형의 디스플레이 장치도 역시 비디오 어댑터(1146) 등의 인터페이스를 통해 시스템 버스(1108)에 연결된다. 모니터(1144)에 부가하여, 컴퓨터는 일반적으로 스피커, 프린터, 기타 등등의 기타 주변 출력 장치(도시 생략)를 포함한다.A monitor (1144) or other type of display device is also connected to the system bus (1108) via an interface, such as a video adapter (1146). In addition to the monitor (1144), the computer typically includes other peripheral output devices (not shown), such as speakers, a printer, and so on.

컴퓨터(1102)는 유선 및/또는 무선 통신을 통해 제조설비(400)에 대해 연결을 사용하여 네트워크화된 환경에서 동작시킬 수 있다. 상기 제조설비(400)는 통상의 네트워크 노드로 연결될 수 있으며, 상기 제조설비(400)는 컴퓨터(1102)에 대해 동일 내지는 유사한 구성을 포함하여 동작을 수행할 수 있다. 도시되어 있는 논리적 연결은 근거리 통신망(LAN)(1152) 및/또는 더 큰 네트워크, 예를 들어, 원거리 통신망(WAN)(1154)에의 유선/무선 연결을 포함한다. 또한, 근거리 통신망(LAN)(1152), 원거리 통신망(WAN)(1154)을 가리지 않고, 통신이 가능한 모든 수단을 논리적 연결에 포함될 수 있을 것이다. The computer (1102) may operate in a networked environment using wired and/or wireless communications to the manufacturing facility (400). The manufacturing facility (400) may be connected to a conventional network node, and the manufacturing facility (400) may perform operations with the same or similar configuration to the computer (1102). The logical connections depicted include wired/wireless connections to a local area network (LAN) (1152) and/or a larger network, such as a wide area network (WAN) (1154). Additionally, any means capable of communicating, whether a local area network (LAN) (1152) or a wide area network (WAN) (1154), may be included in the logical connections.

LAN 네트워킹 환경에서 사용될 때, 컴퓨터(1102)는 유선 및/또는 무선 통신 네트워크 인터페이스 또는 어댑터(1156)를 통해 로컬 네트워크(1152)에 연결된다. 어댑터(1156)는 LAN(1152)에의 유선 또는 무선 통신을 용이하게 해줄 수 있으며, 이 LAN(1152)은 또한 무선 어댑터(1156)와 통신하기 위해 그에 설치되어 있는 무선 액세스 포인트를 포함하고 있다. WAN 네트워킹 환경에서 사용될 때, 컴퓨터(1102)는 모뎀(1158)을 포함할 수 있거나, WAN(1154) 상의 통신 서버에 연결되거나, 또는 인터넷을 통하는 등, WAN(1154)을 통해 통신을 설정하는 기타 수단을 갖는다. 내장형 또는 외장형 및 유선 또는 무선 장치일 수 있는 모뎀(1158)은 직렬 포트 인터페이스(1142)를 통해 시스템 버스(1108)에 연결된다. 네트워크화된 환경에서, 컴퓨터(1102)에 대해 설명된 프로그램 모듈들 또는 그의 일부분이 원격 메모리/저장 장치(1150)에 저장될 수 있다. 도시된 네트워크 연결이 예시적인 것이며 컴퓨터들 사이에 통신 링크를 설정하는 기타 수단이 사용될 수 있다는 것을 잘 알 것이다.When used in a LAN networking environment, the computer (1102) is connected to the local network (1152) via a wired and/or wireless communications network interface or adapter (1156). The adapter (1156) may facilitate wired or wireless communications to the LAN (1152), which may also include a wireless access point installed therein for communicating with the wireless adapter (1156). When used in a WAN networking environment, the computer (1102) may include a modem (1158), be connected to a communications server on the WAN (1154), or have other means for establishing communications over the WAN (1154), such as via the Internet. The modem (1158), which may be internal or external and wired or wireless, is connected to the system bus (1108) via a serial port interface (1142). In a networked environment, program modules described for the computer (1102) or portions thereof may be stored in a remote memory/storage device (1150). It will be appreciated that the network connections depicted are exemplary and other means of establishing a communications link between the computers may be used.

컴퓨터(1102)는 무선 통신으로 배치되어 동작하는 임의의 무선 장치 또는 개체, 예를 들어, 프린터, 스캐너, 데스크톱 및/또는 휴대용 컴퓨터, PDA(portable data assistant), 통신 위성, 무선 검출가능 태그와 연관된 임의의 장비 또는 장소, 및 전화와 통신을 하는 동작을 한다. 이것은 적어도 Wi-Fi 및 블루투스 무선 기술을 포함한다. 따라서, 통신은 종래의 네트워크에서와 같이 미리 정의된 구조이거나 단순하게 적어도 2개의 장치 사이의 애드혹 통신(ad hoc communication)일 수 있다.The computer (1102) is configured to communicate with any wireless device or object that is configured and operates in wireless communication, such as a printer, a scanner, a desktop and/or portable computer, a portable data assistant (PDA), a communication satellite, any equipment or location associated with a wireless detectable tag, and a telephone. This includes at least Wi-Fi and Bluetooth wireless technologies. Accordingly, the communication may be a predefined structure as in a conventional network, or may simply be an ad hoc communication between at least two devices.

Wi-Fi(Wireless Fidelity)는 유선 없이도 인터넷 등으로의 연결을 가능하게 해준다. Wi-Fi는 이러한 장치, 예를 들어, 컴퓨터가 실내에서 및 실외에서, 즉 기지국의 통화권 내의 아무 곳에서나 데이터를 전송 및 수신할 수 있게 해주는 셀 전화와 같은 무선 기술이다. Wi-Fi 네트워크는 안전하고 신뢰성 있으며 고속인 무선 연결을 제공하기 위해 IEEE 802.11(a,b,g, 기타)이라고 하는 무선 기술을 사용한다. 컴퓨터를 서로에, 인터넷에 및 유선 네트워크(IEEE 802.3 또는 이더넷을 사용함)에 연결시키기 위해 Wi-Fi가 사용될 수 있다. Wi-Fi 네트워크는 비인가 2.4 및 5 GHz 무선 대역에서, 예를 들어, 11Mbps(802.11a) 또는 54 Mbps(802.11b) 데이터 레이트로 동작하거나, 양 대역(듀얼 대역)을 포함하는 제품에서 동작할 수 있다.Wi-Fi (Wireless Fidelity) allows you to connect to the Internet and other things without wires. Wi-Fi is a wireless technology that allows devices, such as computers, to send and receive data, indoors and outdoors, anywhere within the coverage area of a base station, similar to a cell phone. Wi-Fi networks use a wireless technology called IEEE 802.11 (a,b,g, etc.) to provide secure, reliable, and high-speed wireless connections. Wi-Fi can be used to connect computers to each other, to the Internet, and to wired networks (using IEEE 802.3 or Ethernet). Wi-Fi networks can operate in the unlicensed 2.4 and 5 GHz radio bands, at data rates of, for example, 11 Mbps (802.11a) or 54 Mbps (802.11b), or in products that include both bands (dual band).

본 개시의 기술 분야에서 통상의 지식을 가진 자는 여기에 개시된 실시예들과 관련하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 프로세서들, 수단들, 회로들 및 알고리즘 단계들이 전자 하드웨어, (편의를 위해, 여기에서 "소프트웨어"로 지칭되는) 다양한 형태들의 프로그램 또는 설계 코드 또는 이들 모두의 결합에 의해 구현될 수 있다는 것을 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 호환성을 명확하게 설명하기 위해, 다양한 예시적인 컴포넌트들, 블록들, 모듈들, 회로들 및 단계들이 이들의 기능과 관련하여 위에서 일반적으로 설명되었다. 이러한 기능이 하드웨어 또는 소프트웨어로서 구현되는지 여부는 특정한 애플리케이션 및 전체 시스템에 대하여 부과되는 설계 제약들에 따라 좌우된다. 본 개시의 기술 분야에서 통상의 지식을 가진 자는 각각의 특정한 애플리케이션에 대하여 다양한 방식들로 설명된 기능을 구현할 수 있으나, 이러한 구현 결정들은 본 개시의 범위를 벗어나는 것으로 해석되어서는 안 될 것이다.Those skilled in the art will appreciate that the various illustrative logical blocks, modules, processors, means, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, various forms of programs or design code (referred to herein, for convenience, as “software”), or a combination of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

여기서 제시된 다양한 실시예들은 방법, 장치, 또는 표준 프로그래밍 및/또는 엔지니어링 기술을 사용한 제조 물품(article)으로 구현될 수 있다. 용어 "제조 물품"은 임의의 컴퓨터-판독가능 장치로부터 액세스 가능한 컴퓨터 프로그램 또는 매체(media)를 포함한다. 예를 들어, 컴퓨터-판독가능 저장 매체는 자기 저장 장치(예를 들면, 하드 디스크, 플로피 디스크, 자기 스트립, 등), 광학 디스크(예를 들면, CD, DVD, 등), 스마트 카드, 및 플래쉬 메모리 장치(예를 들면, EEPROM, 카드, 스틱, 키 드라이브, 등)를 포함하지만, 이들로 제한되는 것은 아니다. 용어 "기계-판독가능 매체"는 명령(들) 및/또는 데이터를 저장, 보유, 및/또는 전달할 수 있는 무선 채널 및 다양한 다른 매체를 포함하지만, 이들로 제한되는 것은 아니다.The various embodiments presented herein can be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" includes a computer program or media accessible from any computer-readable device. For example, computer-readable storage media include, but are not limited to, magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips, etc.), optical disks (e.g., CDs, DVDs, etc.), smart cards, and flash memory devices (e.g., EEPROMs, cards, sticks, key drives, etc.). The term "machine-readable medium" includes, but is not limited to, wireless channels and various other media capable of storing, retaining, and/or transmitting instructions(s) and/or data.

제시된 실시예들에 대한 설명은 임의의 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 개시를 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 개시는 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다.The description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to these embodiments will be apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the embodiments disclosed herein, but is to be construed in the widest scope consistent with the principles and novel features disclosed herein.

(S10): 화자인식 및 STT 수행단계
(S20): 의도추론 수행 단계
(S30): 명령전달 단계
(S31): 명령의도 및 제어권한 확인단계
(S32): 제어명령 분산 처리 단계
(S33): 제어명령 송출 단계(S10): Speaker recognition and STT execution stage
(S20): Intention inference execution step
(S30): Command transmission stage
(S31): Command intent and control authority confirmation step
(S32): Control command distribution processing stage
(S33): Control command transmission stage

Claims

In a method for controlling manufacturing equipment through voice recognition, which is performed by including a computing device (100), a user terminal (200), a server (300), a manufacturing equipment (400), and a network (N),
A computing device (100) includes a speaker recognition and STT performance step (S10) in which the computing device (100) verifies whether the user's input voice input through the user terminal (200) is a registered speaker through connection with the server (300) and performs STT (Speech-to-Text) on the input voice to convert it into text; an intent inference performance step (S20) in which the computing device (100) infers the intent of the text converted in the speaker recognition and STT performance step (S10) through connection with the server (300); and a command transmission step (S30) in which the computing device (100) transmits a command to the linked equipment to transmit the command to the linked equipment.
The above command transmission step (S30) includes a command intent and control authority confirmation step (S31) in which the result of the intent inference performed step (S20) through the computing device (100) is a command intent and whether the user has control authority; a control command distribution processing step (S32) in which the control command is distributedly processed through the computing device (100); and a control command transmission step (S33) in which the computing device (100) transmits the control command to the manufacturing equipment (400).
The above control command distributed processing step (S32) includes at least one of the control command parallel processing step (S321) and the control command sequential processing step (S322), wherein in the control command sequential processing step (S322), in order to prevent malfunction in the case where duplicate or conflicting commands are simultaneously transmitted to one facility, the computing device (100) applies a command newly transmitted in excess of a preset number to a standby state, thereby preventing malfunction of the manufacturing facility. A method for controlling manufacturing facilities through voice recognition.

In the first paragraph,
A method for controlling manufacturing equipment through voice recognition, characterized in that the command intent in the above command intent and control authority confirmation step (S31) includes at least one of target equipment, an action to be performed by the equipment, a target to which the command to be performed is to be applied, or a target to be moved or applied.

delete

In the first paragraph,
A method for controlling manufacturing equipment through voice recognition, characterized in that in the above control command parallel processing step (S321), the computing device (100) processes control commands less than a preset number in parallel to prevent delay in the operation of the manufacturing equipment.

delete

In the first paragraph,
A method for controlling manufacturing equipment through voice recognition, characterized in that, when performing the above control command sequential processing step (S322), control commands waiting in the control command sequential processing step (S322) can be canceled in bulk through a 'stop' command via the user terminal (200) or computing device (100).