KR102342314B1

KR102342314B1 - Method, server and computer program for providing answers to inquiry data based on quality data of pharmaceuticals

Info

Publication number: KR102342314B1
Application number: KR1020210097134A
Authority: KR
Inventors: 안두영; 강민호; 정재준
Original assignee: 주식회사 델버
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-12-22
Anticipated expiration: 2041-07-23
Also published as: WO2023003169A1; KR20230015823A

Abstract

Provided are a method, a server and a computer program for providing responses to query data based on the quality data of pharmaceutical products. A method of providing a response to query data based on quality data of medicines according to various embodiments of the present invention is a method performed by a computing device, the method comprising: obtaining query data; metadataizing the query data to obtain query metadata; selecting one or more similar data sets by performing a search on the drug quality database based on the query metadata; performing data grouping through classification of a plurality of quality data corresponding to the one or more similar data sets and the query data; and providing drug analysis information corresponding to the query data based on the data grouping result.

Description

Method, server and computer program to provide answers to inquiry data based on quality data of pharmaceuticals

본 발명의 다양한 실시예는 의약품의 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법, 서버 및 컴퓨터프로그램에 관한 것이다.Various embodiments of the present invention relate to a method, a server, and a computer program for providing a response to query data based on quality data of pharmaceuticals.

의약품은 일반적으로 사용자의 이익을 위해서 생리적 시스템 또는 질병상태를 변화시키거나 검토하기 위해서 사용되는 모든 물질을 일컫는다. 의약품은 합성의약품 및 바이오의약품을 포괄할 수 있다. 바이오의약품(즉, 생물의약품)은 사람이나 다른 생물체에서 유래된 것을 원료 또는 재료로 하여 제조한 의약품으로서, 보건위생상 특별한 주의가 필요한 의약품을 말하며, 생물학적제제, 유전자재조합의약품, 세포배양의약품, 세포치료제, 유전자치료제, 기타 식품의약품안전처장이 인정하는 제제를 포함한다. Pharmaceuticals generally refer to any substance used to alter or examine physiological systems or disease states for the benefit of users. Pharmaceuticals may include synthetic drugs and biologics. Biopharmaceuticals (i.e., biologics) are pharmaceuticals manufactured using raw materials or materials derived from humans or other living organisms, and refer to pharmaceuticals that require special attention in terms of health and hygiene. Includes therapeutics, gene therapy products, and other products approved by the Minister of Food and Drug Safety.

이러한 의약품은 생산 관리에 통계적 방법을 활용하여 불량품의 발생 원인을 발견하고 그것을 제거함으로써 품질의 유지와 향상을 꾀하는 의약품 품질관리시스템(Pharmaceutical Quality Management System, QMS)을 통해 관리될 수 있다. 의약품 품질관리시스템이란 사용자의 요구사항, 의약품의 안정성 및 유효성을 충족하기 위한 품질규정(Quality Policy)과 품질목표(Quality Objectives)를 달성하는 것에 초점을 둔 제약 및 바이오 사업 프로세스의 총체를 일컫는다. These drugs can be managed through the Pharmaceutical Quality Management System (QMS), which seeks to maintain and improve the quality by finding the cause of the occurrence of defective products and removing them by using statistical methods for production management. The pharmaceutical quality management system refers to the totality of pharmaceutical and bio business processes that focus on achieving quality policies and quality objectives to meet user requirements, stability and efficacy of pharmaceuticals.

최근에는 의약품 품질관리시스템을 전산화한 전자 의약품 품질관리시스템(eQMS, Electronic Quality Management System)에 대한 활용도가 높이지고 있다. eQMS는 서면 기반의 의약품 품질관리시스템을 전자화하여 업무 프로세스의 효율성을 극대화한 방식의 소프트웨어를 의미한다. 예컨대, 서면 기반의 의약품 품질관리시스템은, 문서의 분실 가능성이나, 항목 기재의 누락 가능성 또는, 문서에 대한 접근이 용이하지 않는 등 여러 측면에서 비효율성을 가지고 있으며, eQMS는 이러한 문제들을 효율적으로 처리 가능하도록 한다. Recently, the use of electronic medicine quality management system (eQMS), which computerized the pharmaceutical quality management system, is increasing. eQMS refers to software that maximizes the efficiency of work processes by digitalizing the document-based pharmaceutical quality management system. For example, the document-based drug quality management system has inefficiencies in various aspects, such as the possibility of loss of documents, the possibility of omission of item descriptions, or difficulty in accessing documents, and eQMS handles these problems efficiently make it possible

한편, 전통적 공정 밸리데이션/제어 전략에서는 한정된 밸리데이션 배치 수로 품질 및 공정 제어 전략을 수립하여 상업용 배치에서는 기준 일탈(OOS, Out of Specification) 및 경향 일탈(OOT, Out of Trends)이 수시로 발생하는 문제가 있었고, 이를 근본적으로 해결하기 위해 글로벌 의약품 규제 기관에서는, 설계 기반 품질 고도화(Quality by Design, QbD) 도입을 적극적으로 권고하게 되었다.On the other hand, in the traditional process validation/control strategy, quality and process control strategies were established with a limited number of validation batches. In order to fundamentally solve this problem, global pharmaceutical regulatory agencies have come to actively recommend the introduction of Quality by Design (QbD).

이에 따라, eQMS를 통해 글로벌 의약품 규제 기관의 요구사항에 따르고, 의약품 전반에 대한 전자화된 다량의 품질데이터들로부터 의약품의 연구개발 및 생산과정에서 발생하는 다양한 이슈들을 예측하기 위한 기술개발들이 이루어지고 있다. Accordingly, through eQMS, in accordance with the requirements of global pharmaceutical regulatory agencies, technological developments are being made to predict various issues that occur in the R&D and production process of pharmaceuticals from a large amount of electronic quality data for overall pharmaceuticals. .

대한민국 등록특허 10-2274363Republic of Korea Patent 10-2274363

본 발명이 해결하고자 하는 과제는 전술한 배경기술에 대응하여 안출된 것으로, 의약품의 품질데이터에 기반하여 빅데이터를 구축하고, 의약품 연구개발 및 생산과정에서 발생하는 질의에 대한 해답을 제시할 수 있는 인공지능 모델을 제공하기 위함이다.The problem to be solved by the present invention has been devised in response to the above-mentioned background technology, and it is possible to construct big data based on the quality data of medicines, and to provide answers to questions that arise in the process of drug R&D and production. To provide an artificial intelligence model.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 의약품의 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법이 개시된다. 상기 방법은, 질의데이터를 획득하는 단계, 상기 질의데이터에 기초하여 의약품 품질데이터베이스에 대한 검색을 수행하여 하나 이상의 유사 데이터 세트를 선별하는 단계, 상기 하나 이상의 유사 데이터 세트에 대응하는 복수의 품질데이터 및 상기 질의데이터에 대한 분류를 통해 데이터 그룹화를 수행하는 단계 및 상기 데이터 그룹화 결과에 기초하여 상기 질의데이터에 대응하는 의약품 분석정보를 제공하는 단계를 포함할 수 있다.Disclosed is a method for providing a response to query data based on quality data of pharmaceuticals according to an embodiment of the present invention for solving the above-described problems. The method includes: obtaining query data; performing a search on a drug quality database based on the query data to select one or more similar data sets; a plurality of quality data corresponding to the one or more similar data sets; and It may include performing data grouping through classification of the query data, and providing drug analysis information corresponding to the query data based on the data grouping result.

대안적인 실시예에서, 상기 의약품 품질데이터베이스를 구축하는 단계를 더 포함하며, 상기 의약품 품질데이터베이스를 구축하는 단계는, 복수의 의약품 각각에 대응하는 복수의 품질데이터(OQD, Overall Quality Data)를 획득하는 단계 및 상기 복수의 품질데이터 각각에 대응하는 주요품질프로필(CQP, Critical Quality Profile)에 기초하여 상기 복수의 품질데이터 각각을 하나 이상의 데이터 세트로 그룹화하는 단계를 포함할 수 있다. In an alternative embodiment, further comprising the step of building the drug quality database, the step of building the drug quality database, a plurality of quality data (OQD, Overall Quality Data) corresponding to each of a plurality of drugs to obtain and grouping each of the plurality of quality data into one or more data sets based on a Critical Quality Profile (CQP) corresponding to each of the plurality of quality data.

대안적인 실시예에서, 상기 품질데이터는, 의약품의 일반정보, 생산 및 품질 정의와 관련한 프로파일데이터 및 상기 프로파일데이터에 대응하는 실증데이터를 포함하며, 상기 주요품질프로필은, 상기 의약품의 특징과 성질을 결정지을 수 있는 주요 요소들에 관한 정보로, 상기 검색에 활용되며, 상기 품질데이터의 적어도 일부를 통해 구성될 수 있다. In an alternative embodiment, the quality data includes general information of the drug, profile data related to production and quality definition, and empirical data corresponding to the profile data, wherein the main quality profile includes characteristics and properties of the drug. Information on key factors that can be determined, used for the search, and may be configured through at least a part of the quality data.

대안적인 실시예에서, 상기 의약품 품질데이터베이스를 구축하는 단계는, 연관 규칙 분석 알고리즘을 통해 상기 복수의 품질데이터 각각을 구성하는 요소들 간의 상관관계에 대한 학습을 수행하여 상기 각 요소들 간의 상관관계를 도출하는 상관관계분석모델을 생성하는 단계 및 상기 각 요소들 간의 상관관계에 기반하여 상기 복수의 품질데이터에 대한 메타데이터화를 수행하여 상기 의약품 품질데이터베이스를 구축하는 단계를 포함할 수 있다. In an alternative embodiment, the step of building the drug quality database includes learning about the correlation between the elements constituting each of the plurality of quality data through the association rule analysis algorithm to determine the correlation between the respective elements. It may include the steps of generating a correlation analysis model to be derived, and performing metadata on the plurality of quality data based on the correlation between the respective elements to construct the drug quality database.

대안적인 실시예에서, 상기 하나 이상의 유사 데이터 세트를 선별하는 단계는, 상기 의약품 품질데이터베이스에서 상기 질의데이터에 대응하는 주요품질프로필과 임계 유사도 스코어 이상의 유사도를 갖는 하나 이상의 유사 데이터 세트를 선별하는 단계를 포함할 수 있다. In an alternative embodiment, the selecting of the one or more similar data sets may include selecting one or more similar data sets having a similarity equal to or greater than a threshold similarity score to a key quality profile corresponding to the query data in the drug quality database. may include

대안적인 실시예에서, 상기 임계 유사도 스코어는, 상기 하나 이상의 데이터 세트 각각에 대응하는 주요품질프로필 간의 유사도 스코어를 산출하고, 상기 각 데이터 세트 쌍에 대응하여 생성된 하나 이상의 유사도 스코어에 기초하여 산출되는 것을 특징으로 할 수 있다. In an alternative embodiment, the threshold similarity score is calculated based on a similarity score between key quality profiles corresponding to each of the one or more data sets, and based on one or more similarity scores generated corresponding to each of the data set pairs. can be characterized as

대안적인 실시예에서, 상기 하나 이상의 유사 데이터 세트에 대응하는 복수의 품질데이터 및 상기 질의데이터에 대한 분류를 통해 데이터 그룹화를 수행하는 단계는, 상기 하나 이상의 유사 데이터 세트에 대응하는 복수의 품질데이터 각각을 구성하는 하나 이상의 요소들을 식별하는 단계 및 상기 하나 이상의 요소들을 기준으로 상기 복수의 품질데이터 및 상기 질의데이터를 하나 이상의 데이터 그룹 각각으로 분류하여 데이터 그룹화를 수행하는 단계를 포함할 수 있다. In an alternative embodiment, performing data grouping through classification on the plurality of quality data and the query data corresponding to the one or more similar data sets includes each of the plurality of quality data corresponding to the one or more similar data sets. It may include identifying one or more elements constituting the , and performing data grouping by classifying the plurality of quality data and the query data into one or more data groups, respectively, based on the one or more elements.

대안적인 실시예에서, 상기 의약품 분석정보를 제공하는 단계는, 상기 상관관계분석모델을 활용하여 상기 하나 이상의 데이터 그룹 중 상기 질의데이터가 분류된 데이터 그룹 및 나머지 데이터 그룹 각각의 상관관계를 도출하는 단계 및 상기 각 데이터 그룹 간의 상관관계에 기초하여 상기 질의데이터에 대응하는 의약품 분석정보를 제공하는 단계를 포함할 수 있다. In an alternative embodiment, the providing of the drug analysis information may include deriving a correlation between a data group in which the query data is classified among the one or more data groups and each of the remaining data groups by utilizing the correlation analysis model. and providing drug analysis information corresponding to the query data based on the correlation between the respective data groups.

본 개시의 다른 실시예에 따르면, 의약품의 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법을 수행하는 컴퓨팅 장치가 개시된다. 상기 컴퓨팅 장치는 하나 이상의 인스트럭션을 저장하는 저장부 및 상기 저장부에 저장된 하나 이상의 인스트럭션을 실행하는 프로세서를 포함하고, 상기 프로세서는 상기 하나 이상의 인스트럭션을 실행함으로써, 전술한 의약품의 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법을 수행할 수 있다.According to another embodiment of the present disclosure, a computing device for performing a method of providing a response to query data based on quality data of a drug is disclosed. The computing device includes a storage unit for storing one or more instructions and a processor for executing one or more instructions stored in the storage unit, wherein the processor executes the one or more instructions, thereby querying based on the quality data of the above-mentioned medicines. You can perform a method that provides a response to the data.

본 발명의 또 다른 실시예에 따르면, 컴퓨터에서 독출가능한 기록매체에 저장된 컴퓨터 프로그램이 개시된다. 상기 컴퓨터 프로그램은 하드웨어인 컴퓨터와 결합되어, 전술한 의약품의 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법을 수행할 수 있다.According to another embodiment of the present invention, a computer program stored in a computer-readable recording medium is disclosed. The computer program may be combined with a computer that is hardware to perform a method of providing a response to the query data based on the quality data of the above-described medicines.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

본 발명의 다양한 실시예에 따르면, 의약품 연구개발 및 생산과정에서 발생하는 질의에 대한 해답을 제시할 수 있는 인공지능 모델을 제공할 수 있다. According to various embodiments of the present invention, it is possible to provide an artificial intelligence model that can provide answers to questions that occur in the process of drug R&D and production.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예와 관련된 의약품 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 시스템을 개략적으로 도시한 예시도이다.
도 2는 본 발명의 일 실시예와 관련된 의약품 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 서버의 블록 구성도를 도시한다.
도 3은 본 발명의 일 실시예와 관련된 복수의 품질데이터를 획득하여 데이터베이스를 구축하는 과정을 예시적으로 나타낸 순서도를 도시한다.
도 4는 본 발명의 일 실시예와 관련된 질의데이터에 기반하여 해당 질의데이터에 대응하는 응답을 제공하는 과정을 예시적으로 나타낸 순서도를 도시한다.
도 5는 본 발명의 일 실시예와 관련된 하나 이상의 유사 데이터 세트를 기반으로 하나 이상의 데이터 그룹을 생성하는 과정을 예시적으로 나타낸 예시도이다.
도 6은 본 발명의 일 실시예와 관련된 의약품 품질 데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법을 예시적으로 나타낸 순서도를 도시한다.
도 7은 본 발명의 일 실시예와 관련된 하나 이상의 네트워크 함수를 나타낸 개략도이다.1 is an exemplary diagram schematically illustrating a system for providing a response to inquiry data based on drug quality data related to an embodiment of the present invention.
2 is a block diagram of a server that provides a response to query data based on drug quality data related to an embodiment of the present invention.
3 is a flowchart exemplarily illustrating a process of building a database by acquiring a plurality of quality data related to an embodiment of the present invention.
4 is a flowchart exemplarily illustrating a process of providing a response corresponding to the corresponding query data based on the query data related to an embodiment of the present invention.
5 is an exemplary diagram illustrating a process of generating one or more data groups based on one or more similar data sets related to an embodiment of the present invention.
6 is a flowchart exemplarily illustrating a method of providing a response to inquiry data based on drug quality data related to an embodiment of the present invention.
7 is a schematic diagram illustrating one or more network functions related to an embodiment of the present invention.

다양한 실시예들이 이제 도면을 참조하여 설명된다. 본 명세서에서, 다양한 설명들이 본 개시의 이해를 제공하기 위해서 제시된다. 그러나, 이러한 실시예들은 이러한 구체적인 설명 없이도 실행될 수 있음이 명백하다.Various embodiments are now described with reference to the drawings. In this specification, various descriptions are presented to provide an understanding of the present disclosure. However, it is apparent that these embodiments may be practiced without these specific descriptions.

본 명세서에서 사용되는 용어 "컴포넌트", "모듈", "시스템" 등은 컴퓨터-관련 엔티티, 하드웨어, 펌웨어, 소프트웨어, 소프트웨어 및 하드웨어의 조합, 또는 소프트웨어의 실행을 지칭한다. 예를 들어, 컴포넌트는 프로세서상에서 실행되는 처리과정(procedure), 프로세서, 객체, 실행 스레드, 프로그램, 및/또는 컴퓨터일 수 있지만, 이들로 제한되는 것은 아니다. 예를 들어, 컴퓨팅 장치에서 실행되는 애플리케이션 및 컴퓨팅 장치 모두 컴포넌트일 수 있다. 하나 이상의 컴포넌트는 프로세서 및/또는 실행 스레드 내에 상주할 수 있다. 일 컴포넌트는 하나의 컴퓨터 내에 로컬화 될 수 있다. 일 컴포넌트는 2개 이상의 컴퓨터들 사이에 분배될 수 있다. 또한, 이러한 컴포넌트들은 그 내부에 저장된 다양한 데이터 구조들을 갖는 다양한 컴퓨터 판독가능한 매체로부터 실행할 수 있다. 컴포넌트들은 예를 들어 하나 이상의 데이터 패킷들을 갖는 신호(예를 들면, 로컬 시스템, 분산 시스템에서 다른 컴포넌트와 상호작용하는 하나의 컴포넌트로부터의 데이터 및/또는 신호를 통해 다른 시스템과 인터넷과 같은 네트워크를 통해 전송되는 데이터)에 따라 로컬 및/또는 원격 처리들을 통해 통신할 수 있다.The terms “component,” “module,” “system,” and the like, as used herein, refer to a computer-related entity, hardware, firmware, software, a combination of software and hardware, or execution of software. For example, a component can be, but is not limited to being, a process running on a processor, a processor, an object, a thread of execution, a program, and/or a computer. For example, both an application running on a computing device and the computing device may be a component. One or more components may reside within a processor and/or thread of execution. A component may be localized within one computer. A component may be distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored therein. Components may communicate via a network such as the Internet with another system, for example, via a signal having one or more data packets (eg, data and/or signals from one component interacting with another component in a local system, distributed system, etc.) may communicate via local and/or remote processes depending on the data being transmitted).

더불어, 용어 "또는"은 배타적 "또는"이 아니라 내포적 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되지 않거나 문맥상 명확하지 않은 경우에, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 의도된다. 즉, X가 A를 이용하거나; X가 B를 이용하거나; 또는 X가 A 및 B 모두를 이용하는 경우, "X는 A 또는 B를 이용한다"가 이들 경우들 어느 것으로도 적용될 수 있다. 또한, 본 명세서에 사용된 "및/또는"이라는 용어는 열거된 관련 아이템들 중 하나 이상의 아이템의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.In addition, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless otherwise specified or clear from context, "X employs A or B" is intended to mean one of the natural implicit substitutions. That is, X employs A; X employs B; or when X employs both A and B, "X employs A or B" may apply to either of these cases. It should also be understood that the term “and/or” as used herein refers to and includes all possible combinations of one or more of the listed related items.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는"이라는 용어는, 하나 이상의 다른 특징, 구성요소 및/또는 이들의 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. 또한, 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 본 명세서와 청구범위에서 단수는 일반적으로 "하나 또는 그 이상"을 의미하는 것으로 해석되어야 한다.Also, the terms "comprises" and/or "comprising" should be understood to mean that the feature and/or element in question is present. However, it should be understood that the terms "comprises" and/or "comprising" do not exclude the presence or addition of one or more other features, elements and/or groups thereof. Also, unless otherwise specified or unless it is clear from context to refer to a singular form, the singular in the specification and claims should generally be construed to mean “one or more”.

당업자들은 추가적으로 여기서 개시된 실시예들과 관련되어 설명된 다양한 예시적 논리적 블록들, 구성들, 모듈들, 회로들, 수단들, 로직들, 및 알고리즘 단계들이 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양쪽 모두의 조합들로 구현될 수 있음을 인식해야 한다. 하드웨어 및 소프트웨어의 상호교환성을 명백하게 예시하기 위해, 다양한 예시 적 컴포넌트들, 블록들, 구성들, 수단들, 로직들, 모듈들, 회로들, 및 단계들은 그들의 기능성 측면에서 일반적으로 위에서 설명되었다. 그러한 기능성이 하드웨어로 또는 소프트웨어로서 구현되는지 여부는 전반적인 시스템에 부과된 특정 어플리케이션(application) 및 설계 제한들에 달려 있다. 숙련된 기술자들은 각각의 특정 어플리케이션들을 위해 다양한 방법들로 설명된 기능성을 구현할 수 있다. 다만, 그러한 구현의 결정들이 본 개시내용의 영역을 벗어나게 하는 것으로 해석되어서는 안된다.Those skilled in the art will further appreciate that the various illustrative logical blocks, configurations, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein may be implemented in electronic hardware, computer software, or combinations of both. It should be recognized that they can be implemented with To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. However, such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

제시된 실시예들에 대한 설명은 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 개시를 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이다. 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 개시는 여기에 제시된 실시예들로 한정되는 것이 아니다. 본 개시는 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다.Descriptions of the presented embodiments are provided to enable those of ordinary skill in the art to use or practice the present disclosure. Various modifications to these embodiments will be apparent to those skilled in the art of the present disclosure. The generic principles defined herein may be applied to other embodiments without departing from the scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments presented herein. This disclosure is to be interpreted in the widest scope consistent with the principles and novel features presented herein.

본 명세서에서, 컴퓨터는 적어도 하나의 프로세서를 포함하는 모든 종류의 하드웨어 장치를 의미하는 것이고, 실시 예에 따라 해당 하드웨어 장치에서 동작하는 소프트웨어적 구성도 포괄하는 의미로서 이해될 수 있다. 예를 들어, 컴퓨터는 스마트폰, 태블릿 PC, 데스크톱, 노트북 및 각 장치에서 구동되는 사용자 클라이언트 및 애플리케이션을 모두 포함하는 의미로서 이해될 수 있으며, 또한 이에 제한되는 것은 아니다.In this specification, a computer refers to all types of hardware devices including at least one processor, and may be understood as encompassing software configurations operating in the corresponding hardware device according to embodiments. For example, a computer may be understood to include, but is not limited to, smart phones, tablet PCs, desktops, notebooks, and user clients and applications running on each device.

이하, 첨부된 도면을 참조하여 본 개시의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

본 명세서에서 설명되는 각 단계들은 컴퓨터에 의하여 수행되는 것으로 설명되나, 각 단계의 주체는 이에 제한되는 것은 아니며, 실시 예에 따라 각 단계들의 적어도 일부가 서로 다른 장치에서 수행될 수도 있다.Each step described in this specification is described as being performed by a computer, but the subject of each step is not limited thereto, and at least a portion of each step may be performed in different devices according to embodiments.

도 1은 본 발명의 일 실시예와 관련된 의약품 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 시스템을 개략적으로 도시한 예시도이다. 도 1에 도시된 바와 같이, 본 발명의 실시예들에 따른 시스템(10)은, 서버(100), 클라이언트(200) 및 네트워크를 포함할 수 있다. 도 1에서 도시되는 컴포넌트들은 예시적인 것으로서, 추가적인 컴포넌트들이 존재하거나 또는 도 1에서 도시되는 컴포넌트들 중 일부는 생략될 수 있다. 본 발명의 실시예들에 따른 서버(100) 및 클라이언트(200)는 네트워크를 통해, 본 발명의 일 실시예들에 따른 시스템을 위한 데이터를 상호 송수신할 수 있다.1 is an exemplary diagram schematically illustrating a system for providing a response to inquiry data based on drug quality data related to an embodiment of the present invention. As shown in FIG. 1 , a system 10 according to embodiments of the present invention may include a server 100 , a client 200 , and a network. The components illustrated in FIG. 1 are exemplary, and additional components may be present or some of the components illustrated in FIG. 1 may be omitted. The server 100 and the client 200 according to the embodiments of the present invention may mutually transmit and receive data for the system according to the embodiments of the present invention through a network.

본 발명의 실시예들에 따른 네트워크는 공중전화 교환망(PSTN: Public Switched Telephone Network), xDSL(x Digital Subscriber Line), RADSL(Rate Adaptive DSL), MDSL(Multi Rate DSL), VDSL(Very High Speed DSL), UADSL(Universal Asymmetric DSL), HDSL(High Bit Rate DSL) 및 근거리 통신망(LAN) 등과 같은 다양한 유선 통신 시스템들을 사용할 수 있다.Networks according to embodiments of the present invention include Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), Very High Speed DSL (VDSL). ), a variety of wired communication systems such as Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN) can be used.

또한, 여기서 제시되는 네트워크는 CDMA(Code Division Multi Access), TDMA(Time Division Multi Access), FDMA(Frequency Division Multi Access), OFDMA(Orthogonal Frequency Division Multi Access), SC-FDMA(Single Carrier-FDMA) 및 다른 시스템들과 같은 다양한 무선 통신 시스템들을 사용할 수 있다.In addition, the networks presented herein include Code Division Multi Access (CDMA), Time Division Multi Access (TDMA), Frequency Division Multi Access (FDMA), Orthogonal Frequency Division Multi Access (OFDMA), Single Carrier-FDMA (SC-FDMA) and Various wireless communication systems may be used, such as other systems.

본 발명의 실시예들에 따른 네트워크는 유선 및 무선 등과 같은 그 통신 양태를 가리지 않고 구성될 수 있으며, 단거리 통신망(PAN: Personal Area Network), 근거리 통신망(WAN: Wide Area Network) 등 다양한 통신망으로 구성될 수 있다. 또한, 상기 네트워크는 공지의 월드와이드웹(WWW: World Wide Web)일 수 있으며, 적외선(IrDA: Infrared Data Association) 또는 블루투스(Bluetooth)와 같이 단거리 통신에 이용되는 무선 전송 기술을 이용할 수도 있다. 본 명세서에서 설명된 기술들은 위에서 언급된 네트워크들뿐만 아니라, 다른 네트워크들에서도 사용될 수 있다.The network according to the embodiments of the present invention can be configured regardless of its communication mode, such as wired and wireless, and is composed of various communication networks such as a personal area network (PAN) and a wide area network (WAN). can be In addition, the network may be a well-known World Wide Web (WWW), and may use a wireless transmission technology used for short-range communication, such as infrared (IrDA) or Bluetooth (Bluetooth). The techniques described herein may be used in the networks mentioned above, as well as in other networks.

본 발명의 실시예에 따른 클라이언트(200)는 서버(100)와 통신을 위한 메커니즘을 갖는 시스템에서의 임의의 형태의 노드(들)를 의미할 수 있다. 예를 들어, 이러한 클라이언트(200)는, PC, 랩탑 컴퓨터, 워크스테이션, 단말 및/또는 네트워크 접속성을 갖는 임의의 전자 디바이스를 포함할 수 있다. 또한, 클라이언트는 에이전트, API(Application Programming Interface) 및 플러그-인(Plug-in) 중 적어도 하나에 의해 구현되는 임의의 서버를 포함할 수도 있다. 또한, 클라이언트(200)는 애플리케이션 소스 및/또는 클라이언트 애플리케이션을 포함할 수 있다. 본 개시의 일 실시예에 따르면, 클라이언트(200)로부터 발행된 쿼리(예컨대, 질의데이터)에 따라서, 서버(100)의 후술될 동작들이 수행될 수 있다.The client 200 according to an embodiment of the present invention may mean any type of node(s) in a system having a mechanism for communication with the server 100 . For example, such a client 200 may include a PC, a laptop computer, a workstation, a terminal, and/or any electronic device having network connectivity. In addition, the client may include any server implemented by at least one of an agent, an application programming interface (API), and a plug-in. In addition, the client 200 may include an application source and/or a client application. According to an embodiment of the present disclosure, operations to be described later of the server 100 may be performed according to a query (eg, query data) issued from the client 200 .

본 발명의 일 실시예에 따르면, 서버(100)는 클라이언트로부터 수신한 질의데이터에 대응하는 응답을 제공할 수 있다. 본 발명에서의 질의데이터는, 의약품 연구개발 및 생산과정에서 발생하는 질의에 관련한 데이터일 수 있다. 예컨대, 질의데이터는 기존 데이터베이스에 정의 또는 저장되지 않은 새로운 검토대상 의약품의 품질데이터에 관한 정보일 수 있다. 구체적인 예를 들어, 질의데이터는, 특정 의약품에 관련한 품질 검증 실험에 대한 전체 조건(예컨대, 실험 환경에 관련한 설정 온도, 습도, 실험 시간 등)에 관련한 것일 수 있다. 다른 예를 들어, 질의데이터는, 특정 의약품의 품질특성 중 적어도 일부의 특성(예컨대, 의약품의 품질특성 중 하나인 물질 유래 불순물(Product-related impurity)에 해당하는 응집(aggregation), 또는 절단체(truncated form) 또는 구조 특성(structural properties)에 해당하는 당단백질 패턴(glycosylation pattern) 등)에 관련한 것일 수 있다. 전술한 질의데이터에 대한 구체적인 기재는 일 예시에 불과할 뿐, 본 발명은 이에 제한되지 않는다.According to an embodiment of the present invention, the server 100 may provide a response corresponding to the query data received from the client. In the present invention, the query data may be data related to a query that occurs during drug R&D and production process. For example, the query data may be information about the quality data of a new drug under review that is not defined or stored in the existing database. As a specific example, the query data may relate to overall conditions (eg, set temperature, humidity, experiment time, etc. related to the experimental environment) for quality verification experiments related to a specific drug. For another example, the query data may include at least some of the quality characteristics of a specific drug (eg, an aggregation corresponding to a product-related impurity that is one of the quality characteristics of a drug), or a cut product ( truncated form) or a glycosylation pattern corresponding to structural properties). The detailed description of the above-described query data is only an example, and the present invention is not limited thereto.

즉, 서버(100)는 클라이언트로부터 의약품의 연구개발 및 생산과정에서 발생 가능한 질의에 관련한 질의데이터를 수신하는 경우, 해당 질의데이터에 대응하는 응답을 제공할 수 있다. 여기서, 질의데이터에 대응하는 응답이란, 질의데이터에 대응하는 의약품 분석정보일 수 있다. 예를 들어, 제1질의 데이터가 제품 및 공정 설계에 관련 질의에 관련한 데이터인 경우, 제1질의데이터에 대응하는 의약품 분석정보는, 질의데이터에 따라 예측된 결과데이터 및 최적의 결과데이터를 야기시키는 입력데이터와 각 요소들 간의 관계에 대한 조정을 제안하는 조정데이터 등을 포함할 수 있다. 다른 예를 들어, 제2질의데이터가 경향성 요소에 대한 품질 동향 분석에 관한 질의에 관련한 데이터인 경우, 제2질의데이터에 대응하는 의약품 분석 정보는, 기준 일탈(Out-of-specification) 또는, 경향 일탈(Out-of-trends) 발생 가능성에 대한 경고표시 및 그에 대한 후속조치에 관한 정보 등을 포함할 수 있다. 또 다른 예를 들어, 제3질의데이터가 의약품의 생산 공정 단계에 대한 질의에 관련한 데이터인 경우, 제3질의데이터에 대응하는 의약품 분석 정보는, 각 공정 단계 별 주요 인자 중 위험도가 높은 위해 요소에 관한 정보 및 높은 위험도를 경감시키기 위한 완화계획에 대한 정보 등을 포함할 수 있다. 전술한 다양한 질의데이터 및 각 질의데이터에 대응하는 의약품 분석 정보에 대한 기재는 일 예시일 뿐, 본 발명은 이에 제한되지 않는다. 즉, 본 발명의 서버(100)는 의약품의 연구개발 및 생산과정에서 발생하는 질의에 대응하여 상술한 바와 같은 다양한 응답을 제시할 수 있다. That is, the server 100 may provide a response corresponding to the query data when receiving the query data related to the query that may be generated during the research and development and production process of the drug from the client. Here, the response corresponding to the inquiry data may be medicine analysis information corresponding to the inquiry data. For example, when the first query data is data related to a query related to product and process design, drug analysis information corresponding to the first query data is used to generate predicted result data and optimal result data according to the query data. It may include input data and adjustment data suggesting adjustment of the relationship between each element. For another example, if the second query data is data related to a query related to quality trend analysis for a trend element, drug analysis information corresponding to the second query data may be out-of-specification or trend It may include warning signs about the possibility of out-of-trends and information on follow-up actions. For another example, if the third inquiry data is data related to the inquiry about the stage of the production process of a drug, the drug analysis information corresponding to the third inquiry data is related to the risk factor with high risk among the major factors for each process stage. It may include information on related information and mitigation plans for mitigating high risks. The description of the aforementioned various query data and drug analysis information corresponding to each query data is only an example, and the present invention is not limited thereto. That is, the server 100 of the present invention may present various responses as described above in response to inquiries generated in the process of R&D and production of pharmaceuticals.

본 발명의 일 실시예에 따르면, 서버(100)는 예를 들어, 마이크로프로세서, 메인프레임 컴퓨터, 디지털 싱글 프로세서, 휴대용 디바이스 및 디바이스 제어기 등과 같은 임의의 타입의 컴퓨터 시스템 또는 컴퓨터 디바이스를 포함할 수 있다. 이러한 서버(100)는 도 1에 도시되진 않았지만, DBMS(Database Management System)를 포함할 수 있다. 또한, 서버(100)는 쿼리를 실행하기 위한 장치와 상호 교환 가능하게 사용될 수도 있다. DBMS는 서버(100)에서 쿼리에 대한 파싱, 필요한 데이터를 검색, 삽입, 수정 및/또는 삭제 등과 같은 동작들을 수행하는 것을 허용하기 위한 프로그램으로써, 데이터베이스 서버(100)의 저장부(120)에서 프로세서(130)에 의하여 구현될 수 있다.According to one embodiment of the present invention, server 100 may include any type of computer system or computer device, such as, for example, a microprocessor, a mainframe computer, a digital single processor, a portable device and a device controller, and the like. . Although not shown in FIG. 1 , the server 100 may include a database management system (DBMS). Additionally, server 100 may be used interchangeably with a device for executing queries. DBMS is a program for allowing the server 100 to perform operations such as parsing a query, searching for, inserting, modifying and/or deleting necessary data, and the processor in the storage unit 120 of the database server 100 (130) can be implemented.

서버(100)는 임의의 타입의 데이터베이스로서 명령들을 실행 및 저장하기 위한 프로세서(130) 및 저장부(120)를 포함하는 디바이스를 포함할 수있으나 이로 한정되지는 않는다. 즉, 서버(100)는 소프트웨어, 펌웨어, 하드웨어 또는 이들의 조합을 포함할 수도 있다. 소프트웨어는 데이터베이스 테이블, 스키마, 인덱스 및/또는 데이터를 생성, 삭제 및 수정하기 위한 애플리케이션(들)을 포함할 수 있다. 서버(100)는 클라이언트 또는 다른 컴퓨팅 디바이스로부터의 트랜잭션을 수신할 수 있으며, 예시적인 트랜잭션들은 서버(100)에서의 데이터, 테이블 및/또는 인덱스 등을 검색, 삽입, 수정, 삭제 및/또는 레코드 관리하는 것을 포함할 수 있다.The server 100 may include, but is not limited to, a device including a processor 130 and a storage unit 120 for executing and storing instructions as any type of database. That is, the server 100 may include software, firmware, hardware, or a combination thereof. The software may include application(s) for creating, deleting and modifying database tables, schemas, indexes and/or data. Server 100 may receive transactions from clients or other computing devices, exemplary transactions retrieving, inserting, modifying, deleting, and/or managing records, such as data, tables and/or indexes, and the like, at server 100 . may include doing

또한, 서버(100)는 복수의 의약품에 관한 정보들을 저장할 수 있다. 예를 들어, 서버(100)는 복수의 의약품 각각에 관련한 복수의 품질데이터 및 각 품질 데이터에 대응하는 주요품질프로필 등을 획득하여 저장하는 서버일 수 있다. 서버(100)에 저장된 정보들은 본 발명에서의 신경망을 학습시키기 위한 학습 데이터, 검증 데이터 및 테스트 데이터로 활용될 수 있다. 즉, 서버(100)는 본 발명의 신경망 모델을 학습시키기 위한 데이터 세트에 관한 정보를 저장하고 있을 수 있다.In addition, the server 100 may store information about a plurality of medicines. For example, the server 100 may be a server that acquires and stores a plurality of quality data related to each of a plurality of medicines and a main quality profile corresponding to each quality data. Information stored in the server 100 may be utilized as learning data, verification data, and test data for learning the neural network in the present invention. That is, the server 100 may store information about a data set for training the neural network model of the present invention.

본 발명의 서버(100)는 복수의 품질데이터 및 각 품질데이터에 대응하는 주요품질프로필을 통해 신경망 모델을 학습시킴으로써, 본 발명의 상관관계분석모델을 생성할 수 있다. 상관관계분석모델은 복수의 품질데이터에 포함된 복수의 요소들 간의 상관관계를 분석하기 위한 신경망 모델일 수 있다.The server 100 of the present invention may generate a correlation analysis model of the present invention by learning a neural network model through a plurality of quality data and a main quality profile corresponding to each quality data. The correlation analysis model may be a neural network model for analyzing correlations between a plurality of elements included in a plurality of quality data.

더불어, 도 1에서의 1개의 서버(100)만을 도시하고 있으나, 이보다 많은 서버들 또한 본 발명의 범위에 포함될 수 있다는 점 그리고 서버(100)가 추가적인 컴포넌트들을 포함할 수 있다는 점은 당해 출원분야에 있어서 통상의 지식을 가진 자에게 명백할 것이다. 즉, 서버(100)는 복수 개의 컴퓨팅 장치로 구성될 수도 있다. 다시 말해, 복수의 노드의 집합이 서버(100)를 구성할 수 있다.In addition, although only one server 100 is shown in FIG. 1 , it is in the field of the application that more servers may also be included in the scope of the present invention and that the server 100 may include additional components. It will be clear to those of ordinary skill in the art. That is, the server 100 may be composed of a plurality of computing devices. In other words, a set of a plurality of nodes may constitute the server 100 .

본 발명의 일 실시예에 따르면, 서버(100)는 클라우드 컴퓨팅 서비스를 제공하는 서버일 수 있다. 보다 구체적으로, 서버(100)는 인터넷 기반 컴퓨팅의 일종으로 정보를 사용자의 컴퓨터가 아닌 인터넷에 연결된 다른 컴퓨터로 처리하는 클라우드 컴퓨팅 서비스를 제공하는 서버일 수 있다. 상기 클라우드 컴퓨팅 서비스는 인터넷 상에 자료를 저장해 두고, 사용자가 필요한 자료나 프로그램을 자신의 컴퓨터에 설치하지 않고도 인터넷 접속을 통해 언제 어디서나 이용할 수 있는 서비스일 수 있으며, 인터넷 상에 저장된 자료들을 간단한 조작 및 클릭으로 쉽게 공유하고 전달할 수 있다. 또한, 클라우드 컴퓨팅 서비스는 인터넷 상의 서버에 단순히 자료를 저장하는 것뿐만 아니라, 별도로 프로그램을 설치하지 않아도 웹에서 제공하는 응용프로그램의 기능을 이용하여 원하는 작업을 수행할 수 있으며, 여러 사람이 동시에 문서를 공유하면서 작업을 진행할 수 있는 서비스일 수 있다. 또한, 클라우드 컴퓨팅 서비스는 IaaS(Infrastructure as a Service), PaaS(Platform as a Service), SaaS(Software as a Service), 가상 머신 기반 클라우드 서버 및 컨테이너 기반 클라우드 서버 중 적어도 하나의 형태로 구현될 수 있다. 즉, 본 발명의 서버(100)는 상술한 클라우드 컴퓨팅 서비스 중 적어도 하나의 형태로 구현될 수 있다. 전술한 클라우드 컴퓨팅 서비스의 구체적인 기재는 예시일 뿐, 본 발명의 클라우드 컴퓨팅 환경을 구축하는 임의의 플랫폼을 포함할 수도 있다.According to an embodiment of the present invention, the server 100 may be a server that provides a cloud computing service. More specifically, the server 100 is a type of Internet-based computing, and may be a server that provides a cloud computing service that processes information not with a user's computer but with another computer connected to the Internet. The cloud computing service may be a service that stores data on the Internet and allows the user to use it anytime and anywhere through Internet access without installing necessary data or programs on his/her computer. Easy to share and deliver with a click. In addition, cloud computing service not only stores data on a server on the Internet, but also allows users to perform desired tasks using the functions of applications provided on the web without installing a separate program, and multiple people can simultaneously view documents. It may be a service that allows you to work while sharing. In addition, the cloud computing service may be implemented in the form of at least one of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), a virtual machine-based cloud server, and a container-based cloud server. . That is, the server 100 of the present invention may be implemented in the form of at least one of the above-described cloud computing services. The detailed description of the above-described cloud computing service is merely an example, and may include any platform for constructing the cloud computing environment of the present invention.

본 발명에서의 신경망에 대한 학습 방법, 학습 과정, 복수의 품질데이터 획득하고, 획득한 품질데이터에 기초하여 의약품 품질데이터베이스를 구축하는 방법 및 질의데이터를 수신하는 경우, 의약품 품질데이터베이스 및 신경망 모델을 활용하여 해당 질의데이터에 대응하는 의약품 분석 정보를 제공하는 방법에 대한 구체적인 설명은 이하의 도 2를 참조하여 후술하도록 한다.In the present invention, the learning method, learning process, and method of acquiring a plurality of quality data, building a drug quality database based on the acquired quality data, and when receiving query data, the drug quality database and the neural network model are utilized for the neural network in the present invention. Accordingly, a detailed description of a method of providing drug analysis information corresponding to the corresponding query data will be described later with reference to FIG. 2 below.

도 2는 본 발명의 일 실시예와 관련된 의약품의 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법을 제공하기 위한 서버의 블록 구성도를 도시한다.2 is a block diagram of a server for providing a method for providing a response to inquiry data based on quality data of medicines related to an embodiment of the present invention.

도 2에 도시된 바와 같이, 서버(100)는 네트워크부(110), 저장부(120) 및 프로세서(130)를 포함할 수 있다. 전술한 서버(100)에 포함된 컴포넌트들은 예시적인 것으로 본 발명내용의 권리범위가 전술한 컴포넌트들로 제한되지 않는다. 즉, 본 발명내용의 실시예들에 대한 구현 양태에 따라서 추가적인 컴포넌트들이 포함되거나 전술한 컴포넌트들 중 일부가 생략될 수 있다.As shown in FIG. 2 , the server 100 may include a network unit 110 , a storage unit 120 , and a processor 130 . Components included in the above-described server 100 are exemplary and the scope of the present invention is not limited to the above-described components. That is, additional components may be included or some of the above-described components may be omitted depending on implementation aspects of the embodiments of the present disclosure.

본 발명의 일 실시예에 따르면, 서버(100)는 클라이언트(200)와 데이터를 송수신하는 네트워크부(110)를 포함할 수 있다. 네트워크부(110)는 본 발명의 일 실시예에 따른 질의데이터에 대한 응답을 제공하는 방법을 수행하기 위한 데이터들 및 신경망 모델을 학습시키기 위한 학습 데이터 세트 등을 다른 컴퓨팅 장치, 서버 등과 송수신할 수 있다. 즉, 네트워크부(110)는 서버(100)와 클라이언트 간의 통신 기능을 제공할 수 있다. 예를 들어, 네트워크부(110)는 클라이언트(200)로부터 특정 의약품에 관련한 질의데이터를 수신할 수 있다. 다른 예를 들어, 네트워크부(110)는 클라우드 서버로부터 본 발명의 질의데이터에 대응하여 인덱싱된 메타데이터를 수신할 수 있다. 추가적으로, 네트워크부(110)는 서버(100)로 프로시저를 호출하는 방식으로 서버(100)와 클라이언트(200) 간의 정보 전달을 허용할 수 있다.According to an embodiment of the present invention, the server 100 may include a network unit 110 for transmitting and receiving data to and from the client 200 . The network unit 110 may transmit and receive data for performing the method for providing a response to query data according to an embodiment of the present invention and a training data set for training a neural network model with other computing devices, servers, etc. have. That is, the network unit 110 may provide a communication function between the server 100 and the client. For example, the network unit 110 may receive query data related to a specific medicine from the client 200 . As another example, the network unit 110 may receive metadata indexed in response to the query data of the present invention from the cloud server. Additionally, the network unit 110 may allow information transfer between the server 100 and the client 200 by calling a procedure to the server 100 .

본 발명의 일 실시예에 따른 네트워크부(110)는 공중전화 교환망(PSTN: Public Switched Telephone Network), xDSL(x Digital Subscriber Line), RADSL(Rate Adaptive DSL), MDSL(Multi Rate DSL), VDSL(Very High Speed DSL), UADSL(Universal Asymmetric DSL), HDSL(High Bit Rate DSL) 및 근거리 통신망(LAN) 등과 같은 다양한 유선 통신 시스템들을 사용할 수 있다.Network unit 110 according to an embodiment of the present invention is a public switched telephone network (PSTN), xDSL (x Digital Subscriber Line), RADSL (Rate Adaptive DSL), MDSL (Multi Rate DSL), VDSL ( A variety of wired communication systems such as Very High Speed DSL), Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN) can be used.

또한, 본 명세서에서 제시되는 네트워크부(110)는 CDMA(Code Division Multi Access), TDMA(Time Division Multi Access), FDMA(Frequency Division Multi Access), OFDMA(Orthogonal Frequency Division Multi Access), SC-FDMA(Single Carrier-FDMA) 및 다른 시스템들과 같은 다양한 무선 통신 시스템들을 사용할 수 있다.In addition, the network unit 110 presented herein is CDMA (Code Division Multi Access), TDMA (Time Division Multi Access), FDMA (Frequency Division Multi Access), OFDMA (Orthogonal Frequency Division Multi Access), SC-FDMA ( A variety of wireless communication systems can be used, such as Single Carrier-FDMA) and other systems.

본 발명에서 네트워크부(110)는 유선 및 무선 등과 같은 그 통신 양태를 가리지 않고 구성될 수 있으며, 단거리 통신망(PAN: Personal Area Network), 근거리 통신망(WAN: Wide Area Network) 등 다양한 통신망으로 구성될 수 있다. 또한, 상기 네트워크는 공지의 월드와이드웹(WWW: World Wide Web)일 수 있으며, 적외선(IrDA: Infrared Data Association) 또는 블루투스(Bluetooth)와 같이 단거리 통신에 이용되는 무선 전송 기술을 이용할 수도 있다. 본 명세서에서 설명된 기술들은 위에서 언급된 네트워크들뿐만 아니라, 다른 네트워크들에서도 사용될 수 있다.In the present invention, the network unit 110 may be configured regardless of its communication mode, such as wired and wireless, and may be composed of various communication networks such as a short-range network (PAN: Personal Area Network) and a local area network (WAN: Wide Area Network). can In addition, the network may be a well-known World Wide Web (WWW), and may use a wireless transmission technology used for short-range communication, such as infrared (IrDA) or Bluetooth (Bluetooth). The techniques described herein may be used in the networks mentioned above, as well as in other networks.

본 발명의 일 실시예에 따르면, 저장부(120)는 영구 저장 매체 및 메모리를 포함할 수 있다. According to an embodiment of the present invention, the storage unit 120 may include a permanent storage medium and a memory.

영구 저장 매체는, 예를 들어 자기(magnetic) 디스크, 광학(optical) 디스크 및 광자기(magneto-optical) 저장 디바이스뿐만 아니라 플래시 메모리 및/또는 배터리-백업 메모리에 기초한 저장 디바이스와 같은, 임의의 데이터를 지속적으로 할 수 있는 비-휘발성(non-volatile) 저장 매체를 의미할 수 있다. 이러한 영구 저장 매체는 다양한 통신 수단을 통하여 서버(100)의 프로세서(130) 및 메모리와 통신할 수 있다. 추가적인 실시예에서, 이러한 영구 저장 매체는 서버(100) 외부에 위치하여 서버(100)와 통신 가능할 수도 있다.Persistent storage media may include any data, such as, for example, magnetic disks, optical disks and magneto-optical storage devices, as well as storage devices based on flash memory and/or battery-backed memory. It may mean a non-volatile storage medium that can continuously perform Such a persistent storage medium may communicate with the processor 130 and the memory of the server 100 through various communication means. In an additional embodiment, such a persistent storage medium may be located outside the server 100 to be able to communicate with the server 100 .

메모리는, 예를 들어 동적 램(DRAM, dynamic random access memory), 정적 램(SRAM, static random access memory) 등의 랜덤 액세스 메모리(RAM)와 같은, 프로세서가 직접 접근하는 주된 저장 장치로서 전원이 꺼지면 저장된 정보가 순간적으로 지워지는 휘발성(volatile) 저장 장치를 의미할 수 있지만, 이들로 한정되는 것은 아니다. 이러한 메모리는 프로세서(130)에 의하여 동작 될 수 있다. 메모리는 데이터 값을 포함하는 데이터 테이블(data table)을 임시로 저장할 수 있다. 상기 데이터 테이블은 데이터 값을 포함할 수 있으며, 본 개시의 일 실시예에서 상기 데이터 테이블의 데이터 값은 메모리로부터 영구 저장 매체에 기록될 수 있다. 추가적인 양상에서, 메모리는 버퍼 캐시를 포함하며, 상기 버퍼 캐시의 데이터 블록에는 데이터가 저장될 수 있다. 버퍼 캐시에 저장된 데이터는 백그라운드 프로세스에 의하여 영구 저장 매체에 기록될 수 있다.Memory is the primary storage device directly accessed by the processor, such as random access memory (RAM), such as dynamic random access memory (DRAM), static random access memory (SRAM), etc. It may mean a volatile storage device in which stored information is momentarily erased, but is not limited thereto. Such a memory may be operated by the processor 130 . The memory may temporarily store a data table including data values. The data table may include data values, and in an embodiment of the present disclosure, the data values of the data table may be recorded from a memory to a persistent storage medium. In a further aspect, the memory includes a buffer cache, wherein data may be stored in data blocks of the buffer cache. Data stored in the buffer cache may be written to the persistent storage medium by a background process.

본 발명의 일 실시예에 따르면, 프로세서(130)는 하나 이상의 코어로 구성될 수 있으며, 컴퓨팅 장치의 중앙 처리 장치(CPU: central processing unit), 범용 그래픽 처리 장치(GPGPU: general purpose graphics processing unit), 텐서 처리 장치(TPU: tensor processing unit) 등의 데이터 분석, 딥러닝을 위한 프로세서를 포함할 수 있다.According to an embodiment of the present invention, the processor 130 may be configured with one or more cores, and may include a central processing unit (CPU) and a general purpose graphics processing unit (GPGPU) of a computing device. , data analysis such as a tensor processing unit (TPU), and a processor for deep learning.

프로세서(130)는 저장부(120)에 저장된 컴퓨터 프로그램을 판독하여 본 발명의 일 실시예에 따른 딥러닝을 위한 데이터 처리를 수행할 수 있다. 본 발명의 일 실시예에 따라 프로세서(130)는 신경망의 학습을 위한 연산을 수행할 수 있다. 프로세서(130)는 딥러닝(DL: deep learning)에서 학습을 위한 입력 데이터의 처리, 입력 데이터에서의 피처 추출, 오차 계산, 역전파(backpropagation)를 이용한 신경망의 가중치 업데이트 등의 신경망의 학습을 위한 계산을 수행할 수 있다.The processor 130 may read the computer program stored in the storage unit 120 to perform data processing for deep learning according to an embodiment of the present invention. According to an embodiment of the present invention, the processor 130 may perform an operation for learning the neural network. The processor 130 for learning of the neural network, such as processing input data for learning in deep learning (DL), extracting features from input data, calculating an error, updating the weight of the neural network using backpropagation calculations can be performed.

또한, 프로세서(130)는 CPU, GPGPU, 및 TPU 중 적어도 하나가 네트워크 함수의 학습을 처리할 수 있다. 예를 들어, CPU 와 GPGPU가 함께 네트워크 함수의 학습, 네트워크 함수를 이용한 데이터 분류를 처리할 수 있다. 또한, 본 발명의 일 실시예에서 복수의 컴퓨팅 장치의 프로세서를 함께 사용하여 네트워크 함수의 학습, 네트워크 함수를 이용한 데이터 분류를 처리할 수 있다. 또한, 본 발명의 일 실시예에 따른 컴퓨팅 장치에서 수행되는 컴퓨터 프로그램은 CPU, GPGPU 또는 TPU 실행가능 프로그램일 수 있다.Also, in the processor 130, at least one of a CPU, a GPGPU, and a TPU may process learning of a network function. For example, the CPU and the GPGPU can process learning of a network function and data classification using the network function. Also, in an embodiment of the present invention, learning of a network function and data classification using the network function may be processed by using the processors of a plurality of computing devices together. In addition, the computer program executed in the computing device according to an embodiment of the present invention may be a CPU, GPGPU or TPU executable program.

본 명세서에서 네트워크 함수는 인공 신경망, 뉴런 네트워크와 상호 교환 가능하게 사용될 수 있다. 본 명세서에서 네트워크 함수는 하나 이상의 뉴럴 네트워크를 포함할 수도 있으며, 이 경우 네트워크 함수의 출력은 하나 이상의 뉴럴 네트워크의 출력의 앙상블(ensemble)일 수 있다.In the present specification, a network function may be used interchangeably with an artificial neural network and a neuron network. In the present specification, the network function may include one or more neural networks, and in this case, the output of the network function may be an ensemble of the outputs of the one or more neural networks.

프로세서(130)는 저장부(120)에 저장된 컴퓨터 프로그램을 판독하여 본 발명의 일 실시예에 따른 상관관계분석모델을 제공할 수 있다. 본 발명의 일 실시예에 따르면, 프로세서(130)는 질의데이터에 대응하는 의약품 분석 정보를 생성할 수 있다. 본 발명의 일 실시예에 따라, 프로세서(130)는 상관관계분석모델을 학습시키기 위한 계산을 수행할 수 있다.The processor 130 may read a computer program stored in the storage unit 120 to provide a correlation analysis model according to an embodiment of the present invention. According to an embodiment of the present invention, the processor 130 may generate drug analysis information corresponding to the query data. According to an embodiment of the present invention, the processor 130 may perform a calculation for learning the correlation analysis model.

본 발명의 일 실시예에 따르면, 프로세서(130)는 통상적으로 서버(100)의 전반적인 동작을 처리할 수 있다. 프로세서(130)는 위에서 살펴본 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 저장부(120)에 저장된 응용 프로그램을 구동함으로써, 사용자 또는 사용자 단말에게 적정한 정보 또는, 기능을 제공하거나 처리할 수 있다.According to an embodiment of the present invention, the processor 130 may typically process the overall operation of the server 100 . The processor 130 processes signals, data, information, etc. input or output through the above-described components or runs an application program stored in the storage unit 120 to provide appropriate information or functions to a user or user terminal, or can be processed

본 발명의 일 실시예에 따르면, 프로세서(130)는 의약품 품질데이터베이스를 구축할 수 있다. 프로세서(130)는 복수의 의약품 각각에 대응하는 복수의 품질데이터(OQD, Overall Quality Data)를 획득할 수 있으며, 획득한 복수의 품질데이터에 기반하여 의약품 품질데이터베이스를 구축할 수 있다. 의약품에 대응하는 품질데이터는, 의약품에 대한 기본정보에서부터 연구개발, 생산 및 품질관리를 위한 모든 데이터를 의미하는 것으로, 의약품의 일반정보, 생산 및 품질 정의와 관련한 프로파일데이터 (Profile data) 및 프로파일데이터에 대응하는 실증데이터(Empirical data)를 포함할 수 있다. 프로파일데이터는, 예를 들어, 프로젝트명/코드, 제품명/코드, 제품타입/형태, 치료영역(Therapeutic area), 적응증(Indications), 상표명(Trademark/Brand name), 체형(Dosage form), 투여경로(Route of administration), 첨가제(Excipients), 유효기간(Shelf-life), 개발단계(Development phase), 생산시설(Manufacturing site), 출하시설(Batch release site) 및 목표 시장(Target market) 등에 관한 정보를 포함하는 일반사항정보와 제품 품질 특성(Product Quality Attributes), 주요 품질 특성(Critical Quality Attributes) 및 품질 허용 범위(Acceptable ranges for Quality Attributes) 등에 관한 정보를 포함하는 제품이해정보로 구성된 제품 품질 목표 사항(Quality Target Product Profile)에 관한 정보를 포함할 수 있다. 또한, 프로파일데이터는, 예를 들어, 공정 흐름/단위 공정(Process Flow/Process Unit Operations), 공정 별 투입 물질(Input Materials per Operation), 제조 규모(Manufacturing Scale), 주요 공정 요소(Key Process Attributes), 공정 변수(Process Parameters), 주요 공정 변수(Critical Process Parameters), 물질 요소(Material Attributes), 주요 물질 요소(Critical Material Attributes), 공정 목표(Process targets for quality attributes), 입증된 허용 범위(Proven acceptable ranges) 및 설계 공간(Design Space) 등에 관한 정보를 통해 구성된 공정이해(Process Understanding)에 관한 정보를 포함할 수 있다. 또한, 프로파일데이터는, 예를 들어, 투입물질제어(Input Materials Controls), 작업 제어(Procedural Controls), 공정변수제어(Process Parameter controls), 공정 중 시험(In-process Testing), 기준(Specifications), 특성과 동등성 시험(Characterization & Comparability Testing) 및 공정 모니터링(Process Monitoring)에 관한 정보를 통해 구성된 제어 전략 요소(Control Strategy Elements)에 관한 정보를 포함할 수 있다. 전술한 프로파일데이터에 대한 구체적인 기재는 예시일 뿐, 본 발명은 이에 제한되지 않는다.According to an embodiment of the present invention, the processor 130 may build a drug quality database. The processor 130 may acquire a plurality of quality data (OQD, Overall Quality Data) corresponding to each of a plurality of medicines, and may build a medicine quality database based on the acquired plurality of quality data. Quality data corresponding to pharmaceuticals means all data for R&D, production and quality control from basic information about pharmaceuticals. It may include empirical data corresponding to . Profile data is, for example, project name/code, product name/code, product type/form, therapeutic area, indications, trademark/brand name, dosage form, route of administration Information on route of administration, excipients, shelf-life, development phase, manufacturing site, batch release site and target market, etc. Product quality objective consisting of general information including general information and product understanding information including information on product quality attributes, critical quality attributes, and acceptable ranges for quality attributes (Quality Target Product Profile) information may be included. In addition, the profile data is, for example, Process Flow/Process Unit Operations, Input Materials per Operation, Manufacturing Scale, Key Process Attributes , Process Parameters, Critical Process Parameters, Material Attributes, Critical Material Attributes, Process targets for quality attributes, Proven acceptable ranges) and information about the design space (Design Space), etc. may include information about the process understanding (Process Understanding). In addition, the profile data may be, for example, Input Materials Controls, Procedural Controls, Process Parameter controls, In-process Testing, Specifications, Information on Control Strategy Elements configured through information on Characterization & Comparability Testing and Process Monitoring may be included. The detailed description of the above-described profile data is only an example, and the present invention is not limited thereto.

실증데이터는, 프로파일데이터에 대응하는 실제 연구개발 또는 생산 각 개별 배치의 실제 관측 환경, 또는 관측 값에 관련한 데이터로, 예를 들어, 제품 품질 목표 사항(Quality Target Product Profile)에 대응하는 각 배치별 실제 관측 값을 포함할 수 있으며, 이외에도 각 배치별 물질제어(Materials Controls), 기기제어(Equipment Controls), 공정제어(Process Controls), 위생 및 세척(Hygiene and Cleaning)에 관한 정보들을 통해 구성되는 생산(Manufacture) 관련 정보, 작업실 또는 기기의 온도에 관련한 온도기록, 음압기록, 습도기록, 가스 관리 기록, 폐기물과 폐수 관리 기록 등의 정보들을 통해 구성되는 시설 관리 정보, 물질 및 시약 기준 자료, 물질 및 시약 동등성 시험 자료, 작업자 기록, 라벨링 기록, 칭량 기록, 기기 운전 결과, 기기 교정 기록, 기기 설정 값 등에 관한 정보들을 통해 구성되는 품질 관리 정보 및 일탈, 시정 및 예방조치, 변경 관리 리포트, 자체 실사 자료, 환경 모니터링 점검표, 환경 모니터링 결과, 벨리데이션 자료, 기준 일탈 원인 전체, 기준 일탈 결과, 작업자 교육 기록, 생산 관련 리스트 평가 기록 전체 등의 정보들을 통해 구성되는 제품 보증 관련 정보를 포함할 수 있다. 전술한 실증데이터에 대한 구체적인 기재는 예시일 뿐, 본 발명은 이에 제한되지 않는다. 프로세서(130)가 상기와 같은, 프로파일데이터 및 실증데이터로 구성된 복수의 품질데이터에 기반하여 의약품 품질데이터베이스를 구축하는 구체적인 방법은, 도 3을 참조하여 이하에서 후술하도록 한다.The empirical data is data related to the actual observation environment or observation values of each individual batch of actual R&D or production corresponding to the profile data, for example, for each batch corresponding to the product quality target (Quality Target Product Profile). Production that can include actual observed values, and in addition to information on Material Controls, Equipment Controls, Process Controls, Hygiene and Cleaning for each batch (Manufacture) related information, facility management information, material and reagent reference data, substances and Quality control information and deviations, corrective and preventive actions, change control reports, and self-inspection data organized through information on reagent equivalence test data, operator records, labeling records, weighing records, instrument operation results, instrument calibration records, instrument set values, etc. , environmental monitoring checklist, environmental monitoring results, validation data, all causes of deviation from standards, results of deviations from standards, worker training records, production-related list evaluation records, etc. may include product warranty related information. The detailed description of the above-described empirical data is merely an example, and the present invention is not limited thereto. A specific method for the processor 130 to build a pharmaceutical quality database based on a plurality of quality data composed of profile data and empirical data as described above will be described below with reference to FIG. 3 .

도 3은 본 발명의 일 실시예와 관련된 복수의 품질데이터를 획득하여 데이터베이스를 구축하는 과정을 예시적으로 나타낸 순서도를 도시한다.3 is a flowchart exemplarily illustrating a process of building a database by acquiring a plurality of quality data related to an embodiment of the present invention.

프로세서(130)는 복수의 품질데이터를 획득할 수 있다(S110). 구체적으로, 프로세서(130)는 복수의 의약품 각각에 대응하는 복수의 품질데이터를 획득할 수 있다. 일 실시예에 따르면, 프로세서(130)는 의약품에 대응하는 품질문서에 대한 문서 스캔을 통해 전자 의약품 품질관리시스템(eQMS)에 대응하는 전자 서식으로 전환할 수 있다. 즉, 프로세서(130)는 의약품 각각에 대응하는 품질문서들을 스캔하여 이미지 파일화할 수 있다. 또한, 프로세서(130)는 전자 서식으로 전환된 품질문서에 대한 광학적 문자 판독(OCR, Optical Character Recognition)을 수행하여 eQMS에 대응하는 전자 서식으로 전환할 수 있다. 이 경우, OCR을 통해 전자 서식으로의 전환이 자동화됨에 따라 사용자의 편의성이 증대될 수 있다. The processor 130 may acquire a plurality of quality data (S110). Specifically, the processor 130 may acquire a plurality of quality data corresponding to each of a plurality of medicines. According to one embodiment, the processor 130 may convert the electronic form corresponding to the electronic pharmaceutical quality management system (eQMS) through a document scan of the quality document corresponding to the medicine. That is, the processor 130 may scan the quality documents corresponding to each medicine to form an image file. In addition, the processor 130 may perform optical character recognition (OCR) on the quality document converted into the electronic format to convert it to an electronic format corresponding to the eQMS. In this case, as the conversion to the electronic form is automated through OCR, the user's convenience may be increased.

다른 실시예에 따르면, 프로세서(130)는 eQMS를 통해 서식으로 작성된 문서에 대응하는 입력을 수신함에 따라, 복수의 품질데이터를 획득할 수도 있다. 구체적으로, eQMS는 사용자로부터 전자 서식 입력을 허용하기 위한 입력창을 포함할 수 있다. 사용자는 해당 입력창의 각 항목에 서식 문자와 대응하는 입력값을 입력할 수 있으며, 프로세서(130)는 해당 입력값에 기초하여 전자 문서화함으로써, 품질데이터를 획득할 수도 있다. 예컨대, 사용자의 입력에 기초하여 전자 서식으로의 전환이 수행되는 경우, 별도의 문서 스캔 장치나, 또는 광학적 문자 판독 장치의 구비를 필수적으로 요구되지 않으며, 이에 따라, 컴퓨팅 파워가 절감될 수 있다.According to another embodiment, the processor 130 may acquire a plurality of quality data by receiving an input corresponding to the document written in the form through the eQMS. Specifically, the eQMS may include an input window for allowing an electronic form input from a user. The user may input an input value corresponding to the format character in each item of the corresponding input window, and the processor 130 may obtain quality data by electronically documenting the input value based on the input value. For example, when the conversion to the electronic form is performed based on a user's input, a separate document scanning device or an optical character reading device is not necessarily required, and thus computing power can be reduced.

또한, 프로세서(130)는 복수의 품질데이터에 대한 그룹화를 수행할 수 있다(S120). 구체적으로, 프로세서(130)는 복수의 품질데이터 각각에 대응하는 주요품질프로필(CQP, Critical Quality Profile)에 기초하여 복수의 품질데이터 각각을 하나 이상의 데이터 세트로 그룹화할 수 있다. 주요품질프로필은, 의약품의 특징과 성질을 결정지을 수 있는 주요 요소들에 관한 정보일 수 있다. 이러한 주요품질프로필은, 품질데이터의 적어도 일부를 통해 구성될 수 있다. 예컨대, 주요품질프로필은, 프로파일데이터에 관련한 복수의 요소들 중 적어도 일부를 통해 구성될 수 있다. 예를 들어, 주요품질프로필은, 특정 의약품에 대응하는 프로파일데이터 중 일반 사항에 대한 정보 및 공정 이해에 대한 정보들을 통해 구성될 수 있다. 보다 구체적인 예를 들어, 주요품질프로필은, 제품명/코드, 제품타입/형태, 적응증(Indications), 투여 경로(Route of administration), 공정 흐름/단위공정(Process Flow/Process Unit Operations) 및 제조 규모(Manufacturing Scale) 등에 관련한 요소들을 포함하여 구성될 수 있다. 주요품질프로필은, 품질데이터들의 검색(또는, 인덱싱)을 위해 활용될 수 있다. 주요품질프로필은, 품질데이터에 포함된 복수의 요소(또는 항목)들 중 의약품의 특징과 성질에 관련한 주요 요소들로 구성된 정보일 수 있다. In addition, the processor 130 may perform grouping on a plurality of quality data ( S120 ). Specifically, the processor 130 may group each of the plurality of quality data into one or more data sets based on a Critical Quality Profile (CQP) corresponding to each of the plurality of quality data. The key quality profile may be information on key factors that can determine the characteristics and properties of a drug product. Such a main quality profile may be configured through at least a part of the quality data. For example, the main quality profile may be configured through at least some of a plurality of elements related to the profile data. For example, the main quality profile may be configured through information on general information and process understanding among profile data corresponding to a specific drug. As a more specific example, key quality profiles include product name/code, product type/form, indications, route of administration, process flow/process unit operations, and manufacturing scale ( Manufacturing Scale), etc. may be configured by including elements related to it. The key quality profile may be utilized for retrieval (or indexing) of quality data. The main quality profile may be information composed of main elements related to the characteristics and properties of a drug among a plurality of elements (or items) included in the quality data.

일 실시예에서, 주요품질프로필을 구성하는 복수의 요소들(즉, 의약품의 특징과 성질을 결정지을 수 있는 주요 요소들)은 의약품의 개발 및 생산에 관련한 사용자에 의해 임의로 추가 또는 변경될 수 있다. 추가적인 실시예에 따르면, 주요품질프로필을 구성하는 복수의 요소들은 딥러닝 기반으로 학습된 신경망 모델에 의해 결정될 수도 있다. 구체적으로, 학습된 신경망 모델은, 다양한 품질데이터에 기반한 학습을 통해 생성될 수 있다. 이러한 신경망 모델은, 특정 품질데이터에 포함된 복수의 요소들 간의 상관관계를 분석하고, 해당 상관관계를 바탕으로, 해당 품질데이터에 대응하는 주요품질프로필을 생성할 수 있다. 다시 말해, 학습된 신경망 모델은, 품질데이터를 구성하는 복수의 요소들 중에서 의약품의 특징과 성질을 결정하는 주요한 요소들이 무엇인지를 식별함으로써, 식별된 요소들을 바탕으로 품질데이터에 대응하는 주요품질프로필을 생성할 수 있다. 상기한 기재에서 복수의 요소들 간의 상관관계를 분석하는 과정에 대한 구체적인 설명은 상관관계분석모델의 학습 과정에서 구체적으로 후술하도록 한다.In one embodiment, a plurality of elements constituting the main quality profile (ie, main elements that can determine the characteristics and properties of a drug) may be arbitrarily added or changed by a user involved in the development and production of a drug. . According to an additional embodiment, a plurality of elements constituting the main quality profile may be determined by a neural network model trained based on deep learning. Specifically, the learned neural network model may be generated through learning based on various quality data. Such a neural network model may analyze the correlation between a plurality of elements included in specific quality data, and generate a major quality profile corresponding to the corresponding quality data based on the correlation. In other words, the learned neural network model identifies the main factors that determine the characteristics and properties of a drug from among a plurality of factors constituting the quality data, and based on the identified factors, the main quality profile corresponding to the quality data. can create A detailed description of the process of analyzing the correlation between the plurality of elements in the above description will be described later in detail in the learning process of the correlation analysis model.

다시 말해, 프로세서(130)는 사용자의 입력에 관련하거나 또는 미리 학습된 신경망 모델을 통해 주요품질프로필을 구성하는 요소들을 결정할 수 있다. 이에 따라, 품질데이터의 주요 요소(또는, 일부 항목)을 통해 해당 품질데이터에 대응하는 주요품질프로필이 획득될 수 있다. 예컨대, 다양한 의약품에 대응하는 복수의 품질데이터 각각에 대응하여 특정 요소에 관련한 주요품질프로필 각각이 획득될 수 있다.In other words, the processor 130 may determine the elements constituting the main quality profile related to the user's input or through a pre-trained neural network model. Accordingly, a main quality profile corresponding to the corresponding quality data may be acquired through the main elements (or some items) of the quality data. For example, each of the main quality profiles related to a specific element may be obtained in correspondence with each of a plurality of quality data corresponding to various drugs.

또한, 프로세서(130)는 복수의 품질데이터 각각에 대응하는 주요품질프로필에 기초하여 복수의 품질데이터 각각을 하나 이상의 데이터 세트로 그룹화할 수 있다. 즉, 프로세서(130)에 의한 그룹화를 통해 복수의 품질데이터 각각은 하나 이상의 데이터 세트로 분류될 수 있다. 다시 말해, 주요품질프로필을 기준으로 복수의 품질데이터 각각이 하나 이상의 데이터 세트로 분류될 수 있다. 예컨대, 하나 이상의 데이터 세트는, 제1 데이터 세트, 제2 데이터 세트 및 제3 데이터 세트 등을 포함할 수 있다. 이 경우, 각 데이터 세트는, 각 품질데이터에 대응하는 주요품질프로필을 기준으로 분류된 것일 수 있다. 즉, 제1 데이터 세트에 포함된 하나 이상의 제1품질데이터들과 제2 데이터 세트에 포함된 하나 이상의 제2품질데이터들을 서로 상이한 주요품질프로필을 통해 구분된 것일 수 있다. 보다 구체적인 예를 들어, 제1 데이터 세트에 포함된 하나 이상의 제1품질데이터들에 대응하는 주요품질프로필의 제품명/코드는 'A'일 수 있으나, 제2 데이터 세트에 포함된 하나 이상의 제2품질데이터들에 대응하는 주요품질프로필의 제품명/코드는 'B'일 수 있다. 전술한 설명에서는, 설명의 편의를 위하여 주요품질프로필의 복수 개의 요소들 중 '제품명/코드'를 기준으로 그룹화가 수행됨을 예시적으로 설명하나, 주요품질프로필의 다양한 요소들(예컨대, 제품타입/형태, 적응증, 투여 경로, 공정 흐름/단위공정 및 제조 규모 등)을 더 포함할 수 있음이 통상의 기술자에게 자명할 것이다.In addition, the processor 130 may group each of the plurality of quality data into one or more data sets based on the main quality profile corresponding to each of the plurality of quality data. That is, each of the plurality of quality data may be classified into one or more data sets through grouping by the processor 130 . In other words, each of the plurality of quality data may be classified into one or more data sets based on the main quality profile. For example, the one or more data sets may include a first data set, a second data set, a third data set, and the like. In this case, each data set may be classified based on a main quality profile corresponding to each quality data. That is, one or more first quality data included in the first data set and one or more second quality data included in the second data set may be classified through different main quality profiles. As a more specific example, the product name/code of the main quality profile corresponding to one or more first quality data included in the first data set may be 'A', but one or more second quality data included in the second data set may be 'A'. The product name/code of the main quality profile corresponding to the data may be 'B'. In the foregoing description, for convenience of explanation, it is exemplarily described that grouping is performed based on 'product name/code' among a plurality of elements of the main quality profile, but various elements of the main quality profile (eg, product type/ form, indication, route of administration, process flow/unit process and manufacturing scale, etc.) will be apparent to those skilled in the art.

추가적인 실시예에 따르면, 프로세서(130)는 복수의 의약품에 대응하는 복수의 품질데이터 각각을 대칭키 암호화 알고리즘을 활용하여 암호화할 수 있으며, 해당 암호화된 복수의 품질데이터를 별도의 외부 서버(예컨대, 클라우드 서버)로 전송할 수 있다. 추가적으로, 프로세서(130)는 암호화된 데이터를 복호화하기 위한 비밀키(또는 공개키)를 외부 서버로 전달할 수 있다. 이 경우, 외부 서버는 다양한 의약품들에 관련한 품질데이터들을 수신하여 저장하는 데이터베이스 서버일 수 있으며, 복수의 품질데이터를 암호화하여 외부 서버로 전송하는 것은, 품질 데이터의 정보 유출을 방지하기 위한 것일 수 있다. 즉, 암호화된 데이터는 중간에 도청을 하더라도 데이터를 읽을 수 없게 된다. 이에 따라, 송신 및 수신 과정에서 데이터의 기밀성(confidentiality)이 유지될 수 있다. 데이터베이스 서버는 암호화된 데이터들과 비밀키(또는 공개키)을 수신하는 경우, 대칭키 복호화 알고리즘을 통해 해당 데이터들에 대한 복호화를 수행할 수 있으며, 복호화된 데이터들을 각 데이터에 대응하는 주요품질프로필을 기준으로 분류할 수 있다. 일 실시예에 따르면, 대칭키 암호화 및 복호화 알고리즘은, 암호화와 복호화 과정에서 동일한 키를 사용하는 대칭키 알고리즘인, AES-256(Advanced Encryption Standard) 알고리즘일 수 있다. 전술한 대칭키 알고리즘에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.According to an additional embodiment, the processor 130 may encrypt each of a plurality of quality data corresponding to a plurality of medicines using a symmetric key encryption algorithm, and transmit the encrypted plurality of quality data to a separate external server (eg, cloud server). Additionally, the processor 130 may transmit a private key (or a public key) for decrypting the encrypted data to an external server. In this case, the external server may be a database server that receives and stores quality data related to various medicines, and encrypting a plurality of quality data and transmitting it to an external server may be to prevent information leakage of quality data. . That is, encrypted data cannot be read even if it is intercepted in the middle. Accordingly, confidentiality of data may be maintained during transmission and reception. When the database server receives the encrypted data and the private key (or public key), it can decrypt the data through a symmetric key decryption algorithm, and convert the decrypted data into a key quality profile corresponding to each data. can be classified based on According to an embodiment, the symmetric key encryption and decryption algorithm may be an Advanced Encryption Standard (AES-256) algorithm, which is a symmetric key algorithm that uses the same key in encryption and decryption processes. The detailed description of the above-described symmetric key algorithm is only an example, and the present disclosure is not limited thereto.

즉, 외부 서버(예컨대, 데이터베이스 서버)로 암호화되어 전송된 데이터들은 외부 서버에서 복호화 과정을 거치며, 복호화된 품질데이터들은 주요 품질데이터를 기준으로 하나 이상의 데이터 세트로 구분되어 저장될 수 있다.That is, data encrypted and transmitted to an external server (eg, a database server) undergoes a decryption process in the external server, and the decrypted quality data may be stored after being divided into one or more data sets based on main quality data.

또한, 프로세서(130)는 복수의 품질데이터에 포함된 각 요소들 간의 상관관계를 도출할 수 있다(S130). 이를 위해 프로세서(130)는 연관 규칙 분석(Association Rule Analysis) 알고리즘을 통해 복수의 품질데이터 각각을 구성하는 요소들 간의 상관관계에 대한 학습을 수행하여 각 요소들 간의 상관관계를 도출하는 상관관계분석모델을 생성할 수 있다. 이에 따라, 상관관계분석모델을 활용하여 각 요소들 간의 상관관계를 도출할 수 있다. 프로세서(130)는 복수의 품질데이터를 상관관계분석모델의 입력으로 처리하여 품질데이터에 포함된 각 요소들 간의 상관관계를 도출할 수 있다. In addition, the processor 130 may derive a correlation between each element included in the plurality of quality data (S130). To this end, the processor 130 performs learning on the correlation between elements constituting each of a plurality of quality data through an association rule analysis algorithm, and a correlation analysis model for deriving the correlation between the elements. can create Accordingly, the correlation between each element can be derived by using the correlation analysis model. The processor 130 may process a plurality of quality data as inputs of the correlation analysis model to derive a correlation between elements included in the quality data.

보다 구체적으로, 상관관계분석모델은, 연관 규칙 분석 알고리즘 통해 적어도 둘 이상의 요소 간의 연관성을 분석하는 신경망 모델일 수 있다. 연관 규칙 분석 알고리즘은, 적어도 둘 이상의 요소들 간의 집합이 빈번히 발생하는가를 알려주는 일련의 규칙들을 생성하는 알고리즘일 수 있다. 일 실시예에서, 연관 규칙 분석 알고리즘은, 대규모 빅데이터에 기반하여 특성 의약품과 이에 포함된 요소들 간의 상관관계를 분석하는 장바구니 분석(Market Basket Analysis) 알고리즘에 관련한 것일 수 있다. 장바구니 분석 알고리즘은, 예컨대, Apriori algorithm, FP-Growth　algorithm 및 DHP algorithm 중 적어도 하나를 통해 구현될 수 있다. More specifically, the correlation analysis model may be a neural network model that analyzes a correlation between at least two or more elements through an association rule analysis algorithm. The association rule analysis algorithm may be an algorithm for generating a series of rules indicating whether a set between at least two or more elements frequently occurs. In an embodiment, the association rule analysis algorithm may relate to a market basket analysis algorithm that analyzes a correlation between a characteristic drug and elements included therein based on large-scale big data. The shopping cart analysis algorithm may be implemented, for example, through at least one of the Apriori algorithm, the FP-Growth algorithm, and the DHP algorithm.

일 실시예에서, 프로세서(130)는 복수의 품질데이터 및 각 품질데이터에 포함된 요소 간의 수많은 규칙을 생성하고, 생성된 규칙들을 지지도(support), 신뢰도(confidence) 및 향상도(lift)에 기반하여 판별함으로써, 각 요소 간의 상관관계를 도출할 수 있다. 보다 구체적으로, 프로세서(130)는 품질데이터를 구성하는 복수의 요소들 각각에 기초하여 학습 데이터를 구축할 수 있다. 예컨대, 복수의 요소들 각각을 칼럼으로 표시하고, 유무에 따라 true 혹은 false로 표시하여 테이블로 표시하여 학습 데이터를 구축할 수 있다. 프로세서(130)는 구축된 학습데이터를 식별하여 존재하는 요소들 각각을 기반으로 다양한 규칙을 생성하고, 연관 규칙 분석 알고리즘(또는, 장바구니 분석 알고리즘)을 활용하여 지지도, 신뢰도 및 향상도가 일정 기준치 이상인 규칙들을 선별함으로써, 해당 요소들 간의 상관관계를 도출할 수 있다. 예컨대, 지지도는, 조건절(if, 만약~라면)이 발생할 확률을 통해 정의될 수 있다. 신뢰도는, 각 요소들 간의 연관성 강도를 측정하는데 활용되며, 조건절이 주어졌을 때 결과절이 발생할 조건부 확률을 통해 정의될 수 있다. 또한, 생성된 규칙이 실제 효용가치가 있는지를 판별하는 데 사용되는 향상도는, 각 규칙과 각 규칙에 대응하는 결과가 서로 독립일 때와 비교해 두 사건이 동시에 얼마나 발생하는지 비율에 관한 것일 수 있다. 예컨대, 향상도가 1인 경우, 조건절과 결과는 서로 독립일 수 있다. 이는 규칙사이에 유의미한 연관성이 없는 것을 의미하는 것으로, 규칙의 요소들 간 연관관계가 없음을 의미할 수 있다. In one embodiment, the processor 130 generates a plurality of quality data and a number of rules between elements included in each quality data, and based the generated rules on support, confidence, and lift. By determining this, it is possible to derive a correlation between each element. More specifically, the processor 130 may build the learning data based on each of a plurality of elements constituting the quality data. For example, it is possible to construct learning data by displaying each of a plurality of elements as columns and displaying them as true or false depending on presence or absence as a table. The processor 130 identifies the constructed learning data, generates various rules based on each of the existing elements, and utilizes a related rule analysis algorithm (or shopping cart analysis algorithm) so that the degree of support, reliability, and improvement is greater than or equal to a certain standard value. By selecting the rules, a correlation between the corresponding elements can be derived. For example, support may be defined through the probability of occurrence of a conditional clause (if, if-). Reliability is used to measure the strength of the association between elements, and can be defined through the conditional probability that a result clause will occur when a conditional clause is given. Also, the degree of improvement used to determine whether the generated rule has actual utility value may relate to the ratio of how many two events occur simultaneously compared to when each rule and the result corresponding to each rule are independent of each other. . For example, when the improvement level is 1, the conditional clause and the result may be independent of each other. This means that there is no significant correlation between rules, which may mean that there is no correlation between elements of the rule.

즉, 프로세서(130)는 전술한 바와 같이, 연관 규칙 분석 알고리즘을 통해 학습된 상관관계분석모델을 활용하여 품질데이터를 구성하는 요소들 간의 상관관계를 도출할 수 있다. 품질데이터를 구성하는 요소들 간의 상관관계는 예를 들어, 각 요소들 간의 선후관계 및 순서에 대한 정보일 수 있다. 즉, 상관관계는, 서로 다른 의약품들에 대해 수집 및 정리된 품질데이터들 각각에 대하여, 각 요소 사이에 연관 규칙 알고리즘을 통해 규정된 선후관계를 의미할 수 있다. That is, as described above, the processor 130 may derive the correlation between elements constituting the quality data by using the correlation analysis model learned through the association rule analysis algorithm. The correlation between elements constituting the quality data may be, for example, information about precedence and precedence between elements and order. That is, the correlation may mean a precedence relationship defined through an association rule algorithm between each element for each of the quality data collected and organized for different drugs.

구체적인 예를 들어, 세포 배양에 필요한 액체배지 각각의 조성과 세포생장율은 세포생장을 저해시키는 acetate 농도, 마그네슘 이온 농도 등과 상관관계를 가질 수 있음이 도출될 수 있다. 또한, 어느 정도의 세포 생장율에 따른 특정 범위의 induction 시점이 가장 target protein 생산율이 높은지에 관련한 상관관계가 도출될 수 있다. 전술한 다양한 요소 간에 도출된 상관관계에 대한 구체적인 기재는 예시일 뿐, 본 발명은 이에 제한되지 않는다.As a specific example, it can be derived that the composition and cell growth rate of each liquid medium required for cell culture can have a correlation with the concentration of acetate and magnesium ion that inhibit cell growth. In addition, a correlation related to whether the target protein production rate is the highest in a specific range of induction time points according to the degree of cell growth rate can be derived. The detailed description of the correlation derived between the various elements described above is only an example, and the present invention is not limited thereto.

또한, 프로세서(130)는 복수의 품질데이터에 대한 메타데이터화를 수행하여 의약품 품질데이터베이스를 구축할 수 있다(S140). 구체적으로, 프로세서(130)는 각 요소들 간의 상관관계에 기반하여 복수의 품질데이터에 대한 메타데이터화를 수행하여 의약품 품질데이터베이스를 구축할 수 있다. 메타데이터화는 복수의 품질데이터들을 빅데이터화하기 위한 것일 수 있다. 즉, 프로세서(130)는 도출된 요소들 간의 상관관계를 기반으로, 선후관계에 대한 정보, 인덱스 정보(예컨대, DB내 로우(row) 및 칼럼(column) 등에 관한 인덱스 정보), 프로파일데이터 및 실증데이터 등을 추가적으로 연결하여 메타데이터화할 수 있다. 이러한 메타데이터화에 따라 복수의 품질데이터는 빅데이터화될 수 있으며, 이에 따라, 후술될 의약품 품질데이터베이스에 대하여 특정 질의데이터와 유사한 데이터를 검색하는 과정에서 처리 효율이 향상될 수 있다. 이하에서는, 도 4를 참조하여 질의데이터에 대응하는 응답을 제공하는 과정을 구체적으로 후술하도록 한다. In addition, the processor 130 may perform meta-data on a plurality of quality data to build a pharmaceutical quality database (S140). Specifically, the processor 130 may construct a drug quality database by performing metadataization on a plurality of quality data based on the correlation between each element. Metadataization may be for converting a plurality of quality data into big data. That is, the processor 130, based on the correlation between the derived elements, information on precedence, index information (eg, index information on rows and columns in DB), profile data, and verification Metadata can be formed by additionally connecting data and the like. According to this metadataization, a plurality of quality data may be converted into big data, and thus, processing efficiency may be improved in the process of searching for data similar to specific query data for a drug quality database, which will be described later. Hereinafter, a process of providing a response corresponding to the query data will be described in detail with reference to FIG. 4 .

도 4는 본 발명의 일 실시예와 관련된 복수의 품질데이터를 획득하여 데이터베이스를 구축하는 과정을 예시적으로 나타낸 순서도를 도시한다.4 is a flowchart illustrating a process of building a database by acquiring a plurality of quality data related to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 프로세서(130)는 질의데이터를 획득할 수 있다(S210). 본 발명의 질의데이터는, 의약품 연구개발 및 생산과정에서 발생하는 질의에 관련한 데이터일 수 있다. 예컨대, 질의데이터는 기존 데이터베이스에 정의 또는 저장되지 않은 새로운 검토대상 의약품의 품질데이터에 관한 정보일 수 있다. 구체적인 예를 들어, 질의데이터는, 특정 의약품의 생산 공정 및 분석법 개발 또는 특정 의약품에 관련한 품질 검증 실험에 대한 전체 조건(예컨대, 실험 환경에 관련한 설정 온도, 습도, 실험 시간 등)에 관련한 것일 수 있다. 다른 예를 들어, 질의데이터는, 특정 의약품의 품질특성 중 적어도 일부의 특성(예컨대, 의약품의 품질특성 중 하나인 물질 유래 불순물(Product-related impurity)에 해당하는 응집(aggregation), 또는 절단체(truncated form) 또는 구조 특성(structural properties)에 해당하는 당단백질 패턴(glycosylation pattern) 등)에 관련한 것일 수 있다. 전술한 질의데이터에 대한 구체적인 기재는 일 예시에 불과할 뿐, 본 발명은 이에 제한되지 않는다.According to an embodiment of the present invention, the processor 130 may obtain query data (S210). The query data of the present invention may be data related to a query that occurs during drug R&D and production process. For example, the query data may be information about the quality data of a new drug under review that is not defined or stored in the existing database. As a specific example, the query data may relate to the overall conditions (eg, set temperature, humidity, experiment time, etc. related to the experimental environment) for the development of a production process and analysis method for a specific drug or quality verification experiment related to a specific drug. . For another example, the query data may include at least some of the quality characteristics of a specific drug (eg, an aggregation corresponding to a product-related impurity that is one of the quality characteristics of a drug), or a cut product ( truncated form) or a glycosylation pattern corresponding to structural properties). The detailed description of the above-described query data is only an example, and the present invention is not limited thereto.

이러한 질의데이터는 클라이언트로부터 발행된 쿼리를 통해 획득될 수 있다. 또한, 질의데이터는 eQMS 상에서 품질 문서 서식을 통해 입력됨에 따라 획득될 수도 있다. 구체적으로, eQMS는 사용자로부터 질의 입력을 허용하기 위한 입력창을 포함할 수 있다. 사용자는 해당 입력창에 약품 연구개발 및 생산과정에서 발생하는 질의에 관련한 데이터를 입력할 수 있으며, 프로세서(130)는 해당 입력값에 기초하여 질의데이터를 획득할 수 있다.Such query data may be obtained through a query issued from a client. In addition, the query data may be obtained as it is input through a quality document form on the eQMS. Specifically, the eQMS may include an input window for allowing the user to input a query. The user may input data related to a query occurring in the process of drug R&D and production in the corresponding input window, and the processor 130 may obtain the query data based on the corresponding input value.

추가적인 실시예에 따르면, 프로세서(130)는 질의데이터를 대칭키 암호화 알고리즘을 활용하여 암호화할 수 있으며, 해당 암호화된 질의데이터를 별도의 외부 서버(예컨대, 클라우드 서버)로 전송할 수 있다. 추가적으로, 프로세서(130)는 암호화된 데이터를 복호화하기 위한 비밀키(또는 공개키)를 외부 서버로 전달할 수 있다. 이 경우, 복수의 질의데이터를 암호화하여 외부 서버로 전송하는 것은, 질의데이터의 정보 유출을 방지하기 위한 것일 수 있다. 즉, 암호화된 데이터는 중간에 도청을 하더라도 데이터를 읽을 수 없게 된다. 이에 따라, 송신 및 수신 과정에서 데이터의 기밀성(confidentiality)이 유지될 수 있다.According to an additional embodiment, the processor 130 may encrypt the query data using a symmetric key encryption algorithm, and transmit the encrypted query data to a separate external server (eg, a cloud server). Additionally, the processor 130 may transmit a private key (or a public key) for decrypting the encrypted data to an external server. In this case, encrypting a plurality of query data and transmitting it to an external server may be to prevent information leakage of the query data. That is, encrypted data cannot be read even if it is intercepted in the middle. Accordingly, confidentiality of data may be maintained during transmission and reception.

본 발명의 일 실시예에 따르면, 프로세서(130)는 질의데이터에 대한 메타데이터화를 수행할 수 있다(S220). 예컨대, 복호화된 질의데이터는 동일한 주요품질프로필을 가진 특정 데이터 세트로 분류될 수 있으며, 해당 데이터 세트에 대응하는 로우 및 칼럼에 관련한 인덱스 정보와 프로필 데이터 정보가 메타데이터로 연결됨에 따라, 질의데이터에 대한 메타데이터화가 수행될 수 있다. 즉, 프로세서(130)는 복호화된 질의데이터에 대하여 DB상의 인덱스 정보, 전자서식 정보, 프로파일 데이터와 연관된 실증 데이터 정보를 메타데이터로 연결하여 메타데이터화할 수 있다. 질의데이터가 메타데이터화됨에 따라, 빅데이터 내에서 검색 효율이 향상될 수 있다. According to an embodiment of the present invention, the processor 130 may perform metadataization on the query data (S220). For example, the decrypted query data can be classified into a specific data set with the same main quality profile, and index information and profile data information related to rows and columns corresponding to the data set are connected to the query data as metadata. Metadataization may be performed. That is, the processor 130 may convert the decoded query data into metadata by connecting the index information on the DB, the electronic form information, and the empirical data information related to the profile data as metadata. As the query data becomes metadata, search efficiency in big data can be improved.

본 발명의 일 실시예에 따르면, 프로세서(130)는 질의데이터에 대한 연산을 수행할 수 있다(S230). 질의데이터에 대한 연산은, 의약품 품질데이터베이스에서 질의데이터와 일정 유사도 이상의 유사성을 가진 품질데이터를 선별하기 위한 연산일 수 있다.According to an embodiment of the present invention, the processor 130 may perform an operation on the query data (S230). The operation on the query data may be an operation for selecting quality data having a similarity greater than or equal to the query data in the drug quality database.

구체적으로, 프로세서(130)는 질의 메타데이터에 기초하여 의약품 품질데이터베이스에 대한 검색을 수행하여 하나 이상의 유사 데이터 세트(310)를 선별할 수 있다. 여기서 하나 이상의 유사 데이터 세트(310)는 질의데이터와 일정 기순 이상의 유사도를 갖는 데이터 세트를 의미할 수 있다. 프로세서(130)는 의약품 품질데이터베이스에서 질의데이터에 대응하는 주요품질프로필과 임계 유사도 스코어 이상의 유사도를 갖는 하나 이상의 유사 데이터 세트(310)를 선별할 수 있다. 의약품 품질데이터베이스는 하나 주요품질프로필(즉, CQP)에 기반하여 복수의 품질데이터 각각을 하나 이상의 데이터 세트로 그룹화하여 저장된 것일 수 있다. 즉, 프로세서(130)는 질의데이터에 대응하는 주요품질프로필과 빅데이터에서 각 데이터 세트를 대표하는 주요품질프로필 각각 간의 유사도를 계산할 수 있으며, 임계 유사도 스코어 이상에 해당하는 데이터 세트만을 하나 이상의 유사 데이터 세트(310)로 선별할 수 있다. 일 실시예에서, 주요품질프로필 간의 유사도 비교는, 텍스트에 대한 함수화를 통해 두 비교 대상(즉, 질의데이터에 해당하는 주요품질프로필과 특정 데이터 세트에 대응하는 주요품질프로필)에 대한 벡터화를 수행하고, 각 벡터 간 코사인 유사도(cosine similarity) 비교를 통해 수행되는 것을 특징으로 할 수 있다. 예컨대, 텍스트에 대한 함수화는 Word2Vec 알고리즘에 기반하여 벡터화되는 것을 특징으로 할 수 있다. 예컨대, 산출된 유사도 비교 값은, 비교 대상이 일치 또는 비슷할수록 1에 가까운 값을 보이며, 다를수록 -1에 가까운 값을 보일 수 있다. 전술한 텍스트 간의 유사도 비교 방법에 대한 구체적인 기재는 예시일 뿐, 본 발명은 이에 제한되지 않는다. Specifically, the processor 130 may select one or more similar data sets 310 by performing a search on the drug quality database based on the query metadata. Here, the one or more similar data sets 310 may refer to data sets having a degree of similarity between the query data and the query data in a predetermined order or more. The processor 130 may select one or more similar data sets 310 having a similarity greater than or equal to a threshold similarity score or a major quality profile corresponding to the query data from the drug quality database. The drug quality database may be stored by grouping each of a plurality of quality data into one or more data sets based on one major quality profile (ie, CQP). That is, the processor 130 may calculate the similarity between each of the main quality profile corresponding to the query data and the main quality profile representing each data set in big data, and only the data set corresponding to the threshold similarity score or higher is one or more similar data. A set 310 may be selected. In one embodiment, the similarity comparison between the main quality profiles is vectorized for two comparison objects (that is, the main quality profile corresponding to the query data and the main quality profile corresponding to a specific data set) through functionalization on the text, and , it may be characterized in that it is performed through a cosine similarity comparison between each vector. For example, the functionalization of the text may be characterized in that it is vectorized based on the Word2Vec algorithm. For example, the calculated similarity comparison value may show a value close to 1 as the comparison target is identical or similar, and may show a value close to -1 as the comparison target is different. The detailed description of the method for comparing the degree of similarity between the texts described above is merely an example, and the present invention is not limited thereto.

즉, 프로세서(130)는 복수의 항목 각각을 비교하는 것이 아닌, 주요품질프로필에 대응하는 항목만을 고려하여 질의데이터에 대응하는 하나 이상의 유사 데이터 세트(310)를 선별할 수 있다. 다시 말해, 의약품 품질데이터베이스는, 주요품질프로필을 통해 하나 이상의 데이터 세트로 그룹화되어 있으므로, 프로세서(130)는 각 데이터 세트를 대표하는 주요품질프로필들 각각과 질의데이터에 대응하는 주요품질프로필 간의 유사도 비교를 통해 일정 기준치 이상의 유사도를 갖는 품질데이터들을 선별할 수 있다. 이 경우, 주요품질프로필을 구성하는 요소들은 품질데이터를 구성하는 요소들 보다 항목 개수가 현저히 적음에 따라 연산(즉, 유사도 비교)에 소모되는 컴퓨팅 파워가 저감될 수 있다. 추가적으로, 복수의 의약품에 대응하는 다양한 품질데이터 각각과 질의데이터의 유사도를 하는 것이 아닌, 그룹화된 데이터 세트를 대표하는 주요품질프로필과의 비교이므로, 연산량이 현저히 감소될 수 있다. 즉, 질의데이터와 유사한 데이터들을 선별하는 과정에서 연산량이 저감됨에 따라, 연산 시간이 최소화되는 등 연산 효율이 향상될 수 있다. That is, the processor 130 may select one or more similar data sets 310 corresponding to the query data by considering only the items corresponding to the main quality profile, rather than comparing each of the plurality of items. In other words, since the drug quality database is grouped into one or more data sets through the main quality profile, the processor 130 compares the similarity between each of the main quality profiles representing each data set and the main quality profile corresponding to the query data. Through this, quality data having a similarity greater than or equal to a certain reference value can be selected. In this case, since the number of items constituting the main quality profile is significantly smaller than that of the elements constituting the quality data, computing power consumed for calculation (ie, comparison of similarity) may be reduced. In addition, since it is a comparison with a main quality profile representing a grouped data set rather than a similarity between each of various quality data corresponding to a plurality of medicines and the query data, the amount of computation can be significantly reduced. That is, as the amount of computation is reduced in the process of selecting data similar to the query data, computational efficiency can be improved, such as by minimizing computation time.

본 발명의 일 실시예에 따르면, 유사도 비교(또는, 유사 데이터 세트 선별에)에 기준이 되는 임계 유사도 스코어는, 하나 이상의 데이터 세트 각각에 대응하는 주요품질프로필에 기초하여 산출될 수 있다. 구체적으로, 프로세서(130)는 하나 이상의 데이터 세트 각각에 대응하는 주요품질프로필 간의 유사도 스코어를 산출하고, 각 데이터 세트 쌍에 대응하여 생성된 하나 이상의 유사도 스코어에 기초하여 산출되는 것일 수 있다. 예를 들어, 각 데이터 세트 쌍에 대응하여 생성된 하나 이상의 유사도 스코어가 70, 80, 60, 95인 경우, 프로세서(130)는 해당 유사도 스코어들의 평균 값을 통해 임계 유사도 스코어를 76.25로 결정할 수 있다. 전술한 유사도 스코어 및 임계 유사도 스코어에 대한 구체적인 수치적 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.According to an embodiment of the present invention, a threshold similarity score as a criterion for similarity comparison (or similar data set selection) may be calculated based on a key quality profile corresponding to each of one or more data sets. Specifically, the processor 130 may calculate a similarity score between key quality profiles corresponding to each of one or more data sets, and may be calculated based on one or more similarity scores generated corresponding to each data set pair. For example, when one or more similarity scores generated corresponding to each data set pair are 70, 80, 60, or 95, the processor 130 may determine the threshold similarity score to be 76.25 through an average value of the corresponding similarity scores. . Specific numerical descriptions of the above-described similarity score and threshold similarity score are only examples, and the present disclosure is not limited thereto.

전술한 임계 유사도 산출 방법은, 전체 데이터 세트의 경향성을 반영하기 위한 것일 수 있다. 예컨대, 데이터 세트들 간의 1:1 유사도 비교 결과, 각 유사도 스코어는 높은 스코어로 산출되는 것이 다수 포함되거나 또는 낮은 스코어를 통해 산출되는 것이 다수 포함될 수 있다. 이에 따라, 본 발명은 전체 데이터 세트 간의 유사도 비교(즉, 각 데이터 세트 간 주요품질프로필 간의 유사도 비교)에 기초하여 임계 유사도 스코어를 선정(즉, 유사 데이터 선정의 임계값으로 설정)함으로써, 유사 데이터 세트를 검색하는 과정에서 그 유사도 경향을 반영할 수 있다. 이에 따라, 질의데이터에 대한 유사 데이터 선정 과정에서 전체 데이터의 경향성이 반영될 수 있어, 유사 데이터 세트 선정에 향상된 신뢰도를 담보할 수 있다.The above-described threshold similarity calculation method may be for reflecting the trend of the entire data set. For example, as a result of a 1:1 similarity comparison between data sets, each similarity score may include a plurality of high scores or low scores. Accordingly, the present invention selects a threshold similarity score based on a similarity comparison between all data sets (that is, a similarity comparison between key quality profiles between each data set) (i.e., sets a threshold value for similar data selection), so that similar data The similarity trend can be reflected in the process of searching the set. Accordingly, the tendency of the entire data can be reflected in the similar data selection process for the query data, thereby ensuring improved reliability in the selection of the similar data set.

본 발명의 일 실시예에 따르면, 프로세서(130)는 하나 이상의 유사 데이터 세트(310)에 대응하는 복수의 품질데이터 및 질의데이터에 대한 분류를 통해 데이터 그룹화를 수행할 수 있다. 구체적으로, 프로세서(130)는 하나 이상의 유사 데이터 세트에 대응하는 복수의 품질데이터 각각을 구성하는 하나 이상의 요소들을 식별할 수 있다. 또한, 프로세서(130)는 하나 이상의 요소들을 기준으로 복수의 품질데이터 및 질의데이터를 하나 이상의 데이터 그룹 각각으로 분류하여 데이터 그룹화를 수행할 수 있다.According to an embodiment of the present invention, the processor 130 may perform data grouping through classification of a plurality of quality data and query data corresponding to one or more similar data sets 310 . Specifically, the processor 130 may identify one or more elements constituting each of a plurality of quality data corresponding to one or more similar data sets. In addition, the processor 130 may perform data grouping by classifying a plurality of quality data and query data into one or more data groups, respectively, based on one or more elements.

이 경우, 하나 이상의 유사 데이터 세트(310)에 대한 그룹화는 도 5에 도시된 재그룹화(320)를 의미할 수 있다. 재그룹화(320)는, 주요품질프로필을 기준으로 그룹화된 품질데이터들 및 질의데이터를 각 품질데이터들 및 질의데이터를 구성하는 복수의 요소들을 기준으로 재그룹화하는 것을 의미할 수 있다. 이러한 재그룹화(320)는, k-means 알고리즘을 기반으로 학습된 분류 모델을 통해 수행될 수 있다. 구체적으로, 프로세서(130)는 품질데이터를 구성하는 각 요소들을 벡터화하여 임의의 차원 공간에 표시하고, 각 요소들이 형성하는 초기 클러스터에 기반하여 k개의 센트로이드(centroid)를 설정할 수 있다. 프로세서(130)는, k개의 센트로이드를 설정한 후, 각 요소들이 형성하는 클러스터 간의 거리에 기반하여 센트로이드를 할당할 수 있다. 다시 말해, 각 요소들과 가까운 위치에 각 센트로이드가 할당될 수 있다. 이후, 프로세서(130)는 각 클러스터에 대응하여 각 센트로이드를 클러스트의 중심부로 이동시켜 각 센트로이드를 갱신할 수 있다. 프로세서(130)는 클러스트의 할당이 변하지 않거나, 미리 지정된 허용오차나 최대 반복 횟수에 도달할 때까지 센트로이드 할당과 갱신 과정을 반복함으로써, 알고리즘의 최적화를 수행할 수 있다. 예컨대, 프로세서(130)는 센트로이드가 변화할 때 마다 오차 제곱합을 반복적으로 계산하면서, 변화량에 대한 허용 오차값이 일정 수준 내로 돌아오는 것을 식별하여 최적화를 수행할 수 있다. In this case, grouping for one or more similar data sets 310 may mean regrouping 320 shown in FIG. 5 . The regrouping 320 may mean regrouping the quality data and query data grouped based on the main quality profile based on a plurality of elements constituting the respective quality data and query data. This regrouping 320 may be performed through a classification model learned based on the k-means algorithm. Specifically, the processor 130 may vectorize each element constituting the quality data, display it in an arbitrary dimensional space, and set k centroids based on an initial cluster formed by each element. After setting k centroids, the processor 130 may allocate centroids based on a distance between clusters formed by each element. In other words, each centroid may be assigned to a position close to each element. Thereafter, the processor 130 may update each centroid by moving each centroid to the center of the cluster in response to each cluster. The processor 130 may optimize the algorithm by repeating the centroid allocation and update process until the cluster allocation does not change or a predetermined tolerance or the maximum number of iterations is reached. For example, the processor 130 may perform optimization by identifying that the allowable error value for the change amount returns to within a certain level while repeatedly calculating the sum of squared errors whenever the centroid changes.

즉, 상기한 과정을 통해 생성된 k-means 알고리즘 기반 분류 모델을 통해 복수의 품질데이터(예컨대, 하나 이상의 유사 데이터 세트에 포함된 품질데이터들)의 복수의 요소들이 지시하는 값 각각을 기준으로 데이터들을 재그룹화시킬 수 있다.That is, data based on each value indicated by a plurality of elements of a plurality of quality data (eg, quality data included in one or more similar data sets) through the k-means algorithm-based classification model generated through the above process can be regrouped.

예를 들어, 질의데이터에 대응하여 선별된 하나 이상의 유사 데이터 세트(310)가 제1 유사 데이터 세트(311), 제2 유사 데이터 세트(312) 및 제3 유사 데이터 세트(313)인 경우, 프로세서(130)는 각 유사 데이터 세트에 포함된 모든 품질데이터들을 식별하고, 식별된 품질데이터들 각각의 요소들이 지시하는 값(예컨대, PH, 특정 단계에서의 온도 등)을 기준으로 재그룹화(320)를 수행하여, 제1 데이터 그룹(331), 제2 데이터 그룹(332) 및 제3 데이터 그룹(333)을 형성할 수 있다. 예컨대, 요소 지시 값들이 유사한 데이터들은 동일한 데이터 그룹으로 분류될 수 있다. For example, when the one or more similar data sets 310 selected in response to the query data are the first similar data set 311 , the second similar data set 312 , and the third similar data set 313 , the processor 130 identifies all quality data included in each similar data set, and regroups 320 based on values (eg, PH, temperature at a specific stage, etc.) indicated by each element of the identified quality data. , the first data group 331 , the second data group 332 , and the third data group 333 may be formed. For example, data having similar element indication values may be classified into the same data group.

즉, 프로세서(130)는 기존 주요품질프로필(CQP)을 기준으로 구분되어 있던 데이터들을, 각 요소들의 지시값을 기준으로 하나 이상의 데이터 그룹(330) 각각으로 재그룹화할 수 있다. 이 경우, 질의데이터 또한, 재그룹화된 데이터 그룹에 포함될 수 있다. 다시 말해, 재그룹화 과정에서 질의데이터 또한 지시하는 데이터 그룹으로 분류되어 그룹화될 수 있다. 예컨대, 재그룹화가 수행됨에 따라 질의데이터는 제1 데이터 그룹(331)로 분류될 수 있다. That is, the processor 130 may regroup the data classified based on the existing CQP into one or more data groups 330 based on the indicated value of each element. In this case, the query data may also be included in the regrouped data group. In other words, in the regrouping process, the query data may also be classified and grouped into the indicated data group. For example, as regrouping is performed, the query data may be classified into the first data group 331 .

일 실시예에 따르면, 프로세서(130)는 KNN(K-Nearest Neighbors) 알고리즘 기반으로 학습된 예측 모델을 활용하여 질의데이터가 포함된 데이터 그룹 안에서, 질의데이터와 가장 근접하거나 또는 일치하는 데이터들을 예측할 수 있다. KNN 알고리즘은 지도학습에 기반하여 신경망에 대한 학습을 통해 생성되는 것으로, 가장 가까운 학습 데이터 포인트 하나를 최근접 이웃으로 찾아 예측에 활용하는 신경망 모델일 수 있다. 예컨대, 품질데이터들의 복수의 요소들을 기반하여 학습 데이터 세트가 형성될 수 있다. According to an embodiment, the processor 130 utilizes a prediction model learned based on a K-Nearest Neighbors (KNN) algorithm to predict the data closest to or matching the query data within a data group including the query data. have. The KNN algorithm is generated through learning of a neural network based on supervised learning, and may be a neural network model that finds one of the nearest learning data points as a nearest neighbor and uses it for prediction. For example, a training data set may be formed based on a plurality of elements of quality data.

즉, 프로세서(130)는 knn 알고리즘 기반의 예측 모델을 활용하여 질의데이터가 포함된 데이터 그룹 내에서, 해당 질의데이터와 가장 유사한 데이터들을 식별할 수 있다. That is, the processor 130 may identify data most similar to the corresponding query data in the data group including the query data by using the knn algorithm-based prediction model.

본 발명의 일 실시예에 따르면, 프로세서(130)는 데이터 그룹화 결과에 기초하여 질의데이터에 대응하는 의약품 분석정보를 제공할 수 있다. 구체적으로, 프로세서(130)는 질의데이터가 분류된 데이터 그룹을 기준으로, 각 요소들의 상관관계를 계산하여 의약품 분석정보를 제공할 수 있다. According to an embodiment of the present invention, the processor 130 may provide drug analysis information corresponding to the query data based on the data grouping result. Specifically, the processor 130 may provide the drug analysis information by calculating the correlation of each element based on the data group into which the query data is classified.

자세히 설명하면, 프로세서(130)는 상관관계분석모델을 활용하여 하나 이상의 데이터 그룹 중 질의데이터가 분류된 데이터 그룹 및 나머지 데이터 그룹 각각의 상관관계를 도출할 수 있다. 이 경우, 하나 이상의 데이터 그룹은, 각 요소들을 기준으로 재그룹화된 것일 수 있다. 또한, 프로세서(130)는 각 데이터 그룹 간의 상관관계에 기초하여 질의데이터에 대응하는 의약품 분석 정보를 제공할 수 있다.In more detail, the processor 130 may derive a correlation between a data group in which query data is classified among one or more data groups and each of the remaining data groups by using a correlation analysis model. In this case, one or more data groups may be regrouped based on each element. In addition, the processor 130 may provide drug analysis information corresponding to the query data based on the correlation between each data group.

여기서 상관관계분석모델은 연관 규칙 분석(Association Rule Analysis) 알고리즘을 통해 복수의 품질데이터 각각을 구성하는 요소들 간의 상관관계에 대한 학습을 수행하여 각 요소들 간의 상관관계를 도출하는 신경망 모델일 수 있다.Here, the correlation analysis model may be a neural network model that derives the correlation between elements by learning about the correlation between elements constituting each of a plurality of quality data through an association rule analysis algorithm. .

품질데이터를 구성하는 요소들 간의 상관관계는 예를 들어, 각 요소들 간의 선후관계 및 순서에 대한 정보일 수 있다. 즉, 상관관계는, 서로 다른 의약품들에 대해 수집 및 정리된 품질데이터들 각각에 대하여, 각 요소 사이에 연관 규칙 알고리즘을 통해 규정된 선후관계를 의미할 수 있다.The correlation between elements constituting the quality data may be, for example, information about precedence and precedence between elements and order. That is, the correlation may mean a precedence relationship defined through an association rule algorithm between each element for each of the quality data collected and organized for different drugs.

즉, 프로세서(130)는 복수의 품질데이터에 포함된 각 요소 간의 상관관계를 도출하도록 사전 학습된 상관관계분석모델을 활용하여 질의데이터가 분류된 데이터 그룹과 다른 데이터 그룹 간의 상관관계를 분석할 수 있으며, 분석된 상관관계에 기반하여 의약품 분석정보를 제공할 수 있다. 다시 말해, 상관관계분석모델을 통해 질의데이터가 들어있는 데이터 그룹과 타 데이터 그룹들 간의 상관관계를 도출될 수 있다. 예컨대, 질의데이터가 속한 제1 데이터 그룹이 '응집'에 관련한 경우, 해당 제1그룹과 상관관계연결도가 높은 순에 따라 다양한 데이터 그룹이 소팅(sorting)되어 제공될 수 있다. 일 예시에서 소팅된 데이터 그룹들 중 사용자가 제2 데이터 그룹(공정 중 제어 또는 출하 테스트)을 선택하는 경우, 해당 제2 데이터 그룹에 포함된 값들에 기초하여 의약품분석 정보가 생성되어 제공될 수 있다.That is, the processor 130 can analyze the correlation between the data group into which the query data is classified and other data groups by using the correlation analysis model learned in advance to derive the correlation between each element included in the plurality of quality data. And based on the analyzed correlation, drug analysis information can be provided. In other words, the correlation between the data group containing the query data and other data groups can be derived through the correlation analysis model. For example, when the first data group to which the query data belongs is related to 'aggregation', various data groups may be sorted and provided according to the order in which the degree of correlation with the first group is high. In one example, when the user selects the second data group (in-process control or shipment test) from among the sorted data groups, drug analysis information may be generated and provided based on values included in the second data group. .

보다 구체적인 예를 들어, '응집'이란 질의데이터가 획득되는 경우, 프로세서(130)는 해당 질의데이터에 대응하는 후보 물질에 대한 주요품질프로필을 인덱스로 하여 품질위해분석 데이터, 품질특성 및 제품관련 불순물 등과 같은 프로파일 데이터 및 허용범위를 포함하는 실증데이터를 메타데이터화 할 수 있다. 또한, 프로세서(130)는 의약품 품질데이터베이스에서 기준 이상의 유사도를 충족하는 주요품질프로필을 가진 하나 이상의 데이터 세트를 선정하고, 해당 데이터 세트들을 재그룹화하여 '품질위해분석 데이터'요소에 대응하는 데이터 그룹에 질의데이터를 분류하고, 질의데이터가 분류된 데이터 그룹과 타 데이터 그룹 간의 상관관계 분석을 통해 '응집'이 하이 리스크(high-risk)에 해당한다고 판정하여 의약품 분석정보를 제공할 수 있다. 전술한 의약품 분석정보 제공 과정에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다. For a more specific example, when query data of 'aggregation' is obtained, the processor 130 uses the main quality profile for the candidate material corresponding to the query data as an index, quality risk analysis data, quality characteristics, and product-related impurities Profile data such as profile data and empirical data including permissible ranges can be meta-dataized. In addition, the processor 130 selects one or more data sets having a major quality profile that meets the degree of similarity above the standard from the drug quality database, and regroups the data sets to the data group corresponding to the 'quality risk analysis data' element. Drug analysis information can be provided by classifying the query data and determining that 'aggregation' corresponds to a high risk through correlation analysis between the data group into which the query data is classified and other data groups. The detailed description of the above-described drug analysis information providing process is only an example, and the present disclosure is not limited thereto.

본 발명의 의약품 분석정보는, 의약품의 개발 및 생산에 관련한 사용자의 질의에 관련한 응답에 대한 정보로, 의약품 관계 요소 정보, 의약품 경향성 정보 및 의약품 공정 위험 정보 중 적어도 하나를 포함할 수 있다.The drug analysis information of the present invention is information about a response to a user's inquiry related to the development and production of a drug, and may include at least one of drug-related element information, drug tendency information, and drug process risk information.

여기서 의약품 관계 요소 정보는, 실험방법에 대한 설계 진행 시, 질의데이터에 대응하여 예측된 결과데이터 및 최적의 결과데이터를 야기시키는 입력데이터와 각 요소들 간의 관계에 대한 조정을 제안하는 조정데이터에 관련한 정보를 포함할 수 있다. 예를 들어, 단일 항체(monoclonal antibody)를 개발하고자 하는 경우, 의약품 관계 요소 정보는, 후보 물질에 대한 주요품질특성으로 항체 의존성 세포 독성(Antibody-dependent cellular cytotoxicity, ADCC)과 Fucosylation level에 관련한 정보를 포함할 수 있다. 일 예로, average partial CO2 level이 주요공정변수(critical process parameters, CPPs)로 제안되어 해당 변수의 공정 제어 범위를 제공할 수 있다. 전술한 의약품 관계 요소 정보에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.Here, the drug relation element information relates to the input data that causes the predicted result data and the optimal result data in response to the query data when designing the experimental method, and the adjustment data that suggests adjustment of the relationship between each element. may contain information. For example, when developing a monoclonal antibody, drug-related element information includes information on antibody-dependent cellular cytotoxicity (ADCC) and fucosylation level as major quality characteristics for candidate substances. may include As an example, average partial CO2 level may be suggested as critical process parameters (CPPs) to provide a process control range of the corresponding variable. The detailed description of the above-described drug-related element information is only an example, and the present disclosure is not limited thereto.

또한, 의약품 경향성 정보는, 경향성 요소에 대한 품질 동향 분석 진행 시, 기준 일탈 또는 경향 일탈 발생 가능성에 대해 경고표시 및 그에 대한 후속조치에 관한 정보를 포함할 수 있다. 예를 들어, 인슐린 제제 (insulin analogs)를 생산할 시, 의약품 경향성 정보는, 후보 물질의 원제품 출하 테스트 항목(DS release QC testing) 중 잔여 c-peptide(residual c-peptide)의 농도가 일정 경향을 보이면서 upper limit을 넘어 일탈이 발생할 가능성을 예측하고 일탈 발생 시기 및 일탈 발생 원인에 관한 정보를 포함할 수 있다. 일례로, 잔여 c-peptide에 관여될 수 있는 2번째 정제 크로마토그래피인 이온교환 크로마토그래피 (ion exchange-HPLC)의 column recycle 주기를 원인으로 분석할 수 있으며, new column packing을 해결 방안으로 제안할 수도 있다. 전술한 의약품 경향성 정보에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.In addition, the drug trend information may include information on warning indications and follow-up measures for the possibility of a deviation from a standard or a deviation from a trend when quality trend analysis is performed on the trend element. For example, when producing insulin analogs, drug trend information indicates that the concentration of residual c-peptide in the DS release QC testing of the candidate substance tends to be constant. It is possible to predict the possibility of occurrence of deviation beyond the upper limit while showing, and include information on the timing of occurrence of deviation and cause of occurrence of deviation. For example, the column recycle cycle of ion exchange-HPLC, the second purification chromatography that may be involved in residual c-peptide, can be analyzed as a cause, and new column packing can be proposed as a solution. have. The detailed description of the drug tendency information described above is only an example, and the present disclosure is not limited thereto.

또한, 의약품 공정 위험 정보는, 생산 공정 단계에서 각 공정 단계 별 주요 인자 중 위험도가 높은 위해 요소에 관한 정보 및 높은 위험도를 경감시키기 위한 완화계획에 관한 정보일 수 있다. 예를 들어, 바이오 의약품 생산의 완제 공정에서 특정 설비 사용 시, 의약품 공정 위험 정보는, 바이알 세척 공정(vial cleaning)에서 일탈 발생이 빈번한 경우에 대해 알리고, 이를 방지하기 위해 제조 용수(WFI) 라인의 짧은 교체 주기 등을 완화 계획으로 제시하는 정보를 포함할 수 있다. In addition, the drug process risk information may be information on a high-risk hazard factor among major factors for each process step in the production process step and information on a mitigation plan for reducing the high risk. For example, when using certain equipment in the finished process of biopharmaceutical production, drug process risk information informs about frequent deviations in the vial cleaning process, and in order to prevent this, This may include information suggesting a mitigation plan, such as a shorter replacement cycle.

전술한 바와 같이, 프로세서(130)는 eQMS를 통해 의약품 전반에 대한 전자화된 다량의 품질데이터들에 기반하여 의약품 품질데이터베이스를 구축하고, 해당 의약품 품질데이터베이스를 통한 빅데이터 활용을 통해 의약품의 연구개발 및 생산과정에서 발생하는 다양한 이슈들에 대응하는 응답(또는 해답)을 제시할 수 있다. 이는, 빅데이터화된 데이터 즉, 다양한 의약품들의 구성 요소 간의 상관관계 분석에 기반한 것이므로, 개발, 연구 및 생산 단계에서 사용자에게 유의미한 예측 정보를 제공할 수 있다.As described above, the processor 130 builds a drug quality database based on a large amount of electronic quality data for the entire drug through eQMS, and conducts R&D and It can provide responses (or answers) to various issues that occur in the production process. Since this is based on big data data, that is, correlation analysis between components of various drugs, meaningful prediction information can be provided to users in the development, research, and production stages.

도 6은 본 발명의 일 실시예와 관련된 의약품의 품질데이터에 기반하여 질의데이터에 대한 응답을 제공하는 방법을 예시적으로 도시한 순서도이다.6 is a flowchart exemplarily illustrating a method of providing a response to inquiry data based on quality data of medicines related to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 상기 방법은, 질의데이터를 획득하는 단계(S310)를 포함할 수 있다.According to an embodiment of the present invention, the method may include obtaining query data (S310).

본 발명의 일 실시예에 따르면, 상기 방법은, 질의데이터에 대한 메타데이터화를 수행하는 단계(S320)를 포함할 수 있다.According to an embodiment of the present invention, the method may include performing metadataization on the query data ( S320 ).

본 발명의 일 실시예에 따르면, 상기 방법은, 질의데이터에 기초하여 의약품 품질데이터베이스에 대한 검색을 수행하여 하나 이상의 유사 데이터 세트를 선별하는 단계(S330)를 포함할 수 있다.According to an embodiment of the present invention, the method may include selecting one or more similar data sets by performing a search on a drug quality database based on the query data (S330).

본 발명의 일 실시예에 따르면, 상기 방법은, 하나 이상의 유사 데이터 세트에 대응하는 복수의 품질데이터 및 질의데이터에 대한 분류를 통해 데이터 그룹화를 수행하는 단계(S340)를 포함할 수 있다.According to an embodiment of the present invention, the method may include performing data grouping through classification of a plurality of quality data and query data corresponding to one or more similar data sets ( S340 ).

본 발명의 일 실시예에 따르면, 상기 방법은, 데이터 그룹화 결과에 기초하여 질의데이터에 대응하는 의약품 분석정보를 제공하는 단계(S350)를 포함할 수 있다.According to an embodiment of the present invention, the method may include providing (S350) drug analysis information corresponding to the query data based on the data grouping result.

전술한 도 6에 도시된 단계들은 필요에 의해 순서가 변경될 수 있으며, 적어도 하나 이상의 단계가 생략 또는 추가될 수 있다. 즉, 전술한 단계는 본 발명의 일 실시예에 불과할 뿐, 본 발명의 권리 범위는 이에 제한되지 않는다.The order of the steps illustrated in FIG. 6 described above may be changed if necessary, and at least one or more steps may be omitted or added. That is, the above-described steps are merely an embodiment of the present invention, and the scope of the present invention is not limited thereto.

도 7은 본 발명의 일 실시예와 관련된 하나 이상의 네트워크 함수를 나타낸 개략도이다.7 is a schematic diagram illustrating one or more network functions related to an embodiment of the present invention.

본 명세서에 걸쳐, 연산 모델, 신경망, 네트워크 함수, 뉴럴 네트워크(neural network)는 동일한 의미로 사용될 수 있다. 신경망은 일반적으로 “노드”라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 “노드”들은 “뉴런(neuron)”들로 지칭될 수도 있다. 신경망은 적어도 하나 이상의 노드들을 포함하여 구성된다. 신경망들을 구성하는 노드(또는 뉴런)들은 하나 이상의“링크”에 의해 상호 연결될 수 있다.Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. A neural network may be composed of a set of interconnected computational units, which may generally be referred to as “nodes”. These “nodes” may also be referred to as “neurons”. A neural network is configured by including at least one or more nodes. Nodes (or neurons) constituting neural networks may be interconnected by one or more “links”.

신경망 내에서, 링크를 통해 연결된 하나 이상의 노드들은 상대적으로 입력 노드 및 출력 노드의 관계를 형성할 수 있다. 입력 노드 및 출력 노드의 개념은 상대적인 것으로서, 하나의 노드에 대하여 출력 노드 관계에 있는 임의의 노드는 다른 노드와의 관계에서 입력 노드 관계에 있을 수 있으며, 그 역도 성립할 수 있다. 상술한 바와 같이, 입력 노드 대 출력 노드 관계는 링크를 중심으로 생성될 수 있다. 하나의 입력 노드에 하나 이상의 출력 노드가 링크를 통해 연결될 수 있으며, 그 역도 성립할 수 있다.In the neural network, one or more nodes connected through a link may relatively form a relationship between an input node and an output node. The concepts of an input node and an output node are relative, and any node in an output node relationship with respect to one node may be in an input node relationship in a relationship with another node, and vice versa. As described above, an input node-to-output node relationship may be created around a link. One or more output nodes may be connected to one input node through a link, and vice versa.

하나의 링크를 통해 연결된 입력 노드 및 출력 노드 관계에서, 출력 노드는 입력 노드에 입력된 데이터에 기초하여 그 값이 결정될 수 있다. 여기서 입력 노드와 출력 노드를 상호 연결하는 노드는 가중치(weight)를 가질 수 있다. 가중치는 가변적일 수 있으며, 신경망이 원하는 기능을 수행하기 위해, 사용자 또는 알고리즘에 의해 가변될 수 있다. 예를 들어, 하나의 출력 노드에 하나 이상의 입력 노드가 각각의 링크에 의해 상호 연결된 경우, 출력 노드는 상기 출력 노드와 연결된 입력 노드들에 입력된 값들 및 각각의 입력 노드들에 대응하는 링크에 설정된 가중치에 기초하여 출력 노드 값을 결정할 수 있다.In the relationship between the input node and the output node connected through one link, the value of the output node may be determined based on data input to the input node. Here, a node interconnecting the input node and the output node may have a weight. The weight may be variable, and may be changed by a user or an algorithm in order for the neural network to perform a desired function. For example, when one or more input nodes are interconnected to one output node by respective links, the output node sets values input to input nodes connected to the output node and links corresponding to the respective input nodes. An output node value may be determined based on the weight.

상술한 바와 같이, 신경망은 하나 이상의 노드들이 하나 이상의 링크를 통해 상호 연결되어 신경망 내에서 입력 노드 및 출력 노드 관계를 형성한다. 신경망 내에서 노드들과 링크들의 개수 및 노드들과 링크들 사이의 연관관계, 링크들 각각에 부여된 가중치의 값에 따라, 신경망의 특성이 결정될 수 있다. 예를 들어, 동일한 개수의 노드 및 링크들이 존재하고, 링크들 사이의 가중치 값이 상이한 두 신경망이 존재하는 경우, 두 개의 신경망들은 서로 상이한 것으로 인식될 수 있다.As described above, in a neural network, one or more nodes are interconnected through one or more links to form an input node and an output node relationship in the neural network. The characteristics of the neural network may be determined according to the number of nodes and links in the neural network, the correlation between the nodes and the links, and the value of a weight assigned to each of the links. For example, when the same number of nodes and links exist and there are two neural networks having different weight values between the links, the two neural networks may be recognized as different from each other.

신경망은 하나 이상의 노드들을 포함하여 구성될 수 있다. 신경망을 구성하는 노드들 중 일부는, 최초 입력 노드로부터의 거리들에 기초하여, 하나의 레이어(layer)를 구성할 수 있다, 예를 들어, 최초 입력 노드로부터 거리가 n인 노드들의 집합은, n 레이어를 구성할 수 있다. 최초 입력 노드로부터 거리는, 최초 입력 노드로부터 해당 노드까지 도달하기 위해 거쳐야 하는 링크들의 최소 개수에 의해 정의될 수 있다. 그러나, 이러한 레이어의 정의는 설명을 위한 임의적인 것으로서, 신경망 내에서 레이어의 차수는 상술한 것과 상이한 방법으로 정의될 수 있다. 예를 들어, 노드들의 레이어는 최종 출력 노드로부터 거리에 의해 정의될 수도 있다.A neural network may include one or more nodes. Some of the nodes constituting the neural network may configure one layer based on distances from the initial input node. For example, a set of nodes having a distance of n from the initial input node is You can configure n layers. The distance from the initial input node may be defined by the minimum number of links that must be passed to reach the corresponding node from the initial input node. However, the definition of such a layer is arbitrary for description, and the order of the layer in the neural network may be defined in a different way from the above. For example, a layer of nodes may be defined by a distance from the final output node.

최초 입력 노드는 신경망 내의 노드들 중 다른 노드들과의 관계에서 링크를 거치지 않고 데이터가 직접 입력되는 하나 이상의 노드들을 의미할 수 있다. 또는, 신경망 네트워크 내에서, 링크를 기준으로 한 노드 간의 관계에 있어서, 링크로 연결된 다른 입력 노드를 가지지 않는 노드들을 의미할 수 있다. 이와 유사하게, 최종 출력 노드는 신경망 내의 노드들 중 다른 노드들과의 관계에서, 출력 노드를 가지지 않는 하나 이상의 노드들을 의미할 수 있다. 또한, 히든 노드는 최초 입력 노드 및 최후 출력 노드가 아닌 신경망을 구성하는 노드들을 의미할 수 있다. 본 발명의 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수와 동일할 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 감소하다가 다시 증가하는 형태의 신경망일 수 있다. 또한, 본 발명의 다른 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수 보다 적을 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 감소하는 형태의 신경망일 수 있다. 또한, 본 발명의 또 다른 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수보다 많을 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 증가하는 형태의 신경망일 수 있다. 본 발명의 또 다른 일 실시예에 따른 신경망은 상술한 신경망들의 조합된 형태의 신경망일 수 있다.The initial input node may mean one or more nodes to which data is directly input without going through a link in a relationship with other nodes among nodes in the neural network. Alternatively, in a relationship between nodes based on a link in a neural network, it may mean nodes that do not have other input nodes connected by a link. Similarly, the final output node may refer to one or more nodes that do not have an output node in relation to other nodes among nodes in the neural network. In addition, the hidden node may mean nodes constituting the neural network other than the first input node and the last output node. The neural network according to an embodiment of the present invention may be a neural network in which the number of nodes in the input layer may be the same as the number of nodes in the output layer, and the number of nodes decreases and then increases again as the input layer progresses to the hidden layer. can In addition, in the neural network according to another embodiment of the present invention, the number of nodes in the input layer may be less than the number of nodes in the output layer, and the number of nodes may be reduced as the number of nodes progresses from the input layer to the hidden layer. have. In addition, the neural network according to another embodiment of the present invention may be a neural network in which the number of nodes in the input layer may be greater than the number of nodes in the output layer, and the number of nodes increases as the number of nodes progresses from the input layer to the hidden layer. can The neural network according to another embodiment of the present invention may be a neural network in the form of a combination of the aforementioned neural networks.

딥 뉴럴 네트워크(DNN: deep neural network, 심층신경망)는 입력레이어와 출력 레이어 외에 복수의 히든 레이어를 포함하는 신경망을 의미할 수 있다. 딥 뉴럴 네트워크를 이용하면 데이터의 잠재적인 구조(latent structures)를 파악할 수 있다. 즉, 사진, 글, 비디오, 음성, 음악의 잠재적인 구조(예를 들어, 어떤 물체가 사진에 있는지, 글의 내용과 감정이 무엇인지, 음성의 내용과 감정이 무엇인지 등)를 파악할 수 있다. 딥 뉴럴 네트워크는 컨볼루션 뉴럴 네트워크(CNN: convolutional neural network), 리커런트 뉴럴 네트워크(RNN: recurrent neural network), 오토 인코더(auto encoder), GAN(Generative Adversarial Networks), 제한 볼츠만 머신(RBM: restricted boltzmann machine), 심층 신뢰 네트워크(DBN: deep belief network), Q 네트워크, U 네트워크, 샴 네트워크 등을 포함할 수 있다. 전술한 딥 뉴럴 네트워크의 기재는 예시일 뿐이며 본 발명은 이에 제한되지 않는다.A deep neural network (DNN) may refer to a neural network including a plurality of hidden layers in addition to an input layer and an output layer. Deep neural networks can be used to identify the latent structures of data. In other words, it can identify the potential structure of photos, texts, videos, voices, and music (e.g., what objects are in the photos, what the text and emotions are, what the texts and emotions are, etc.) . Deep neural networks include convolutional neural networks (CNNs), recurrent neural networks (RNNs), auto encoders, generative adversarial networks (GANs), and restricted boltzmann machines (RBMs). machine), a deep trust network (DBN), a Q network, a U network, a Siamese network, and the like. The description of the deep neural network described above is only an example, and the present invention is not limited thereto.

뉴럴 네트워크는 교사 학습(supervised learning), 비교사 학습(unsupervised learning) 및 반교사학습(semi supervised learning) 중 적어도 하나의 방식으로 학습될 수 있다. 뉴럴 네트워크의 학습은 출력의 오류를 최소화하기 위한 것이다. 뉴럴 네트워크의 학습에서 반복적으로 학습 데이터를 뉴럴 네트워크에 입력시키고 학습 데이터에 대한 뉴럴 네트워크의 출력과 타겟의 에러를 계산하고, 에러를 줄이기 위한 방향으로 뉴럴 네트워크의 에러를 뉴럴 네트워크의 출력 레이어에서부터 입력 레이어 방향으로 역전파(backpropagation)하여 뉴럴 네트워크의 각 노드의 가중치를 업데이트 하는 과정이다. 교사 학습의 경우 각각의 학습 데이터에 정답이 라벨링되어있는 학습 데이터를 사용하며(즉, 라벨링된 학습 데이터), 비교사 학습의 경우는 각각의 학습 데이터에 정답이 라벨링되어 있지 않을 수 있다. 즉, 예를 들어 데이터 분류에 관한 교사 학습의 경우의 학습 데이터는 학습 데이터 각각에 카테고리가 라벨링 된 데이터 일 수 있다. 라벨링된 학습 데이터가 뉴럴 네트워크에 입력되고, 뉴럴 네트워크의 출력(카테고리)과 학습 데이터의 라벨이 비교함으로써 오류(error)가 계산될 수 있다. 다른 예로, 데이터 분류에 관한 비교사 학습의 경우 입력인 학습 데이터가 뉴럴 네트워크 출력과 비교됨으로써 오류가 계산될 수 있다. 계산된 오류는 뉴럴 네트워크에서 역방향(즉, 출력 레이어에서 입력 레이어 방향)으로 역전파 되며, 역전파에 따라 뉴럴 네트워크의 각 레이어의 각 노드들의 연결 가중치가 업데이트 될 수 있다. 업데이트 되는 각 노드의 연결 가중치는 학습률(learning rate)에 따라 변화량이 결정될 수 있다. 입력 데이터에 대한 뉴럴 네트워크의 계산과 에러의 역전파는 학습 사이클(epoch)을 구성할 수 있다. 학습률은 뉴럴 네트워크의 학습 사이클의 반복 횟수에 따라 상이하게 적용될 수 있다. 예를 들어, 뉴럴 네트워크의 학습 초기에는 높은 학습률을 사용하여 뉴럴 네트워크가 빠르게 일정 수준의 성능을 확보하도록 하여 효율성을 높이고, 학습 후기에는 낮은 학습률을 사용하여 정확도를 높일 수 있다.The neural network may be learned by at least one of teacher learning (supervised learning), unsupervised learning (unsupervised learning), and semi-supervised learning (semi-supervised learning). The training of the neural network is to minimize the error in the output. In the training of a neural network, iteratively input the training data into the neural network, calculate the output of the neural network and the target error for the training data, and calculate the error of the neural network from the output layer of the neural network to the input layer in the direction to reduce the error. It is a process of updating the weight of each node in the neural network by backpropagation in the direction. In the case of teacher learning, learning data in which the correct answer is labeled in each learning data is used (ie, labeled learning data), and in the case of comparative learning, the correct answer may not be labeled in each learning data. That is, for example, learning data in the case of teacher learning related to data classification may be data in which categories are labeled in each of the learning data. Labeled training data is input to the neural network, and an error can be calculated by comparing the output (category) of the neural network with the label of the training data. As another example, in the case of comparison learning related to data classification, an error may be calculated by comparing the input training data with the neural network output. The calculated error is backpropagated in the reverse direction (ie, from the output layer to the input layer) in the neural network, and the connection weight of each node of each layer of the neural network may be updated according to the backpropagation. A change amount of the connection weight of each node to be updated may be determined according to a learning rate. The computation of the neural network on the input data and the backpropagation of errors can constitute a learning cycle (epoch). The learning rate may be applied differently according to the number of repetitions of the learning cycle of the neural network. For example, in the early stage of learning of a neural network, a high learning rate can be used to enable the neural network to quickly obtain a certain level of performance, thereby increasing efficiency, and using a low learning rate at a later stage of learning can increase accuracy.

뉴럴 네트워크의 학습에서 일반적으로 학습 데이터는 실제 데이터(즉, 학습된 뉴럴 네트워크를 이용하여 처리하고자 하는 데이터)의 부분집합일 수 있으며, 따라서, 학습 데이터에 대한 오류는 감소하나 실제 데이터에 대해서는 오류가 증가하는 학습 사이클이 존재할 수 있다. 과적합(overfitting)은 이와 같이 학습 데이터에 과하게 학습하여 실제 데이터에 대한 오류가 증가하는 현상이다. 예를 들어, 노란색 고양이를 보여 고양이를 학습한 뉴럴 네트워크가 노란색 이외의 고양이를 보고는 고양이임을 인식하지 못하는 현상이 과적합의 일종일 수 있다. 과적합은 머신러닝 알고리즘의 오류를 증가시키는 원인으로 작용할 수 있다. 이러한 과적합을 막기 위하여 다양한 최적화 방법이 사용될 수 있다. 과적합을 막기 위해서는 학습 데이터를 증가시키거나, 레귤라이제이션(regularization), 학습의 과정에서 네트워크의 노드 일부를 생략하는 드롭아웃(dropout) 등의 방법이 적용될 수 있다.In the training of neural networks, in general, the training data may be a subset of real data (that is, data to be processed using the trained neural network), and thus, the error on the training data is reduced, but the error on the real data is reduced. There may be increasing learning cycles. Overfitting is a phenomenon in which errors on actual data increase by over-learning on training data as described above. For example, a phenomenon in which a neural network that has learned a cat by seeing a yellow cat does not recognize that it is a cat when it sees a cat other than yellow may be a type of overfitting. Overfitting can act as a cause of increasing errors in machine learning algorithms. In order to prevent such overfitting, various optimization methods can be used. In order to prevent overfitting, methods such as increasing training data, regularization, or dropout in which a part of nodes in the network are omitted in the process of learning, may be applied.

본 명세서에 걸쳐, 연산 모델, 신경망, 네트워크 함수, 뉴럴 네트워크(neural network)는 동일한 의미로 사용될 수 있다. (이하에서는 신경망으로 통일하여 기술한다.) 데이터 구조는 신경망을 포함할 수 있다. 그리고 신경망을 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망을 포함한 데이터 구조는 또한 신경망에 입력되는 데이터, 신경망의 가중치, 신경망의 하이퍼 파라미터, 신경망으로부터 획득한 데이터, 신경망의 각 노드 또는 레이어와 연관된 활성 함수, 신경망의 학습을 위한 손실 함수를 포함할 수 있다. 신경망을 포함한 데이터 구조는 상기 개시된 구성들 중 임의의 구성 요소들을 포함할 수 있다. 즉 신경망을 포함한 데이터 구조는 신경망에 입력되는 데이터, 신경망의 가중치, 신경망의 하이퍼 파라미터, 신경망으로부터 획득한 데이터, 신경망의 각 노드 또는 레이어와 연관된 활성 함수, 신경망의 트레이닝을 위한 손실 함수 등 전부 또는 이들의 임의의 조합을 포함하여 구성될 수 있다. 전술한 구성들 이외에도, 신경망을 포함한 데이터 구조는 신경망의 특성을 결정하는 임의의 다른 정보를 포함할 수 있다. 또한, 데이터 구조는 신경망의 연산 과정에 사용되거나 발생되는 모든 형태의 데이터를 포함할 수 있으며 전술한 사항에 제한되는 것은 아니다. 컴퓨터 판독가능 매체는 컴퓨터 판독가능 기록 매체 및/또는 컴퓨터 판독가능 전송 매체를 포함할 수 있다. 신경망은 일반적으로 노드라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 노드들은 뉴런(neuron)들로 지칭될 수도 있다. 신경망은 적어도 하나 이상의 노드들을 포함하여 구성된다.Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. (Hereinafter, the neural network is unified and described.) The data structure may include a neural network. And the data structure including the neural network may be stored in a computer-readable medium. Data structures, including neural networks, may also include data input to the neural network, weights of the neural network, hyperparameters of the neural network, data obtained from the neural network, activation functions associated with each node or layer of the neural network, and loss functions for learning the neural network. have. A data structure comprising a neural network may include any of the components disclosed above. That is, the data structure including the neural network includes all or all of the data input to the neural network, the weights of the neural network, hyperparameters of the neural network, data obtained from the neural network, the activation function associated with each node or layer of the neural network, and the loss function for training the neural network. may be configured including any combination of In addition to the above-described configurations, a data structure including a neural network may include any other information that determines a characteristic of a neural network. In addition, the data structure may include all types of data used or generated in the operation process of the neural network, and is not limited to the above. Computer-readable media may include computer-readable recording media and/or computer-readable transmission media. A neural network may be composed of a set of interconnected computational units, which may generally be referred to as nodes. These nodes may also be referred to as neurons. A neural network is configured by including at least one or more nodes.

데이터 구조는 신경망에 입력되는 데이터를 포함할 수 있다. 신경망에 입력되는 데이터를 포함하는 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망에 입력되는 데이터는 신경망 학습 과정에서 입력되는 학습 데이터 및/또는 학습이 완료된 신경망에 입력되는 입력 데이터를 포함할 수 있다. 신경망에 입력되는 데이터는 전처리(pre-processing)를 거친 데이터 및/또는 전처리 대상이 되는 데이터를 포함할 수 있다. 전처리는 데이터를 신경망에 입력시키기 위한 데이터 처리 과정을 포함할 수 있다. 따라서 데이터 구조는 전처리 대상이 되는 데이터 및 전처리로 발생되는 데이터를 포함할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 발명은 이에 제한되지 않는다.The data structure may include data input to the neural network. A data structure including data input to the neural network may be stored in a computer-readable medium. The data input to the neural network may include learning data input in a neural network learning process and/or input data input to the neural network in which learning is completed. Data input to the neural network may include pre-processing data and/or pre-processing target data. The preprocessing may include a data processing process for inputting data into the neural network. Accordingly, the data structure may include data to be pre-processed and data generated by pre-processing. The above-described data structure is merely an example, and the present invention is not limited thereto.

데이터 구조는 신경망의 가중치를 포함할 수 있다. (본 명세서에서 가중치, 파라미터는 동일한 의미로 사용될 수 있다.) 그리고 신경망의 가중치를 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망은 복수개의 가중치를 포함할 수 있다. 가중치는 가변적일 수 있으며, 신경망이 원하는 기능을 수행하기 위해, 사용자 또는 알고리즘에 의해 가변 될 수 있다. 예를 들어, 하나의 출력 노드에 하나 이상의 입력 노드가 각각의 링크에 의해 상호 연결된 경우, 출력 노드는 상기 출력 노드와 연결된 입력 노드들에 입력된 값들 및 각각의 입력 노드들에 대응하는 링크에 설정된 파라미터에 기초하여 출력 노드 값을 결정할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 발명은 이에 제한되지 않는다.The data structure may include the weights of the neural network. (In this specification, weight and parameter may be used interchangeably.) And the data structure including the weight of the neural network may be stored in a computer-readable medium. The neural network may include a plurality of weights. The weight may be variable, and may be changed by a user or an algorithm in order for the neural network to perform a desired function. For example, when one or more input nodes are interconnected to one output node by respective links, the output node sets values input to input nodes connected to the output node and links corresponding to the respective input nodes. An output node value may be determined based on the parameter. The above-described data structure is merely an example, and the present invention is not limited thereto.

제한이 아닌 예로서, 가중치는 신경망 학습 과정에서 가변되는 가중치 및/또는 신경망 학습이 완료된 가중치를 포함할 수 있다. 신경망 학습 과정에서 가변되는 가중치는 학습 사이클이 시작되는 시점의 가중치 및/또는 학습 사이클 동안 가변되는 가중치를 포함할 수 있다. 신경망 학습이 완료된 가중치는 학습 사이클이 완료된 가중치를 포함할 수 있다. 따라서 신경망의 가중치를 포함한 데이터 구조는 신경망 학습 과정에서 가변되는 가중치 및/또는 신경망 학습이 완료된 가중치를 포함한 데이터 구조를 포함할 수 있다. 그러므로 상술한 가중치 및/또는 각 가중치의 조합은 신경망의 가중치를 포함한 데이터 구조에 포함되는 것으로 한다. 전술한 데이터 구조는 예시일 뿐 본 발명은 이에 제한되지 않는다.By way of example and not limitation, the weight may include a weight variable in a neural network learning process and/or a weight in which neural network learning is completed. The variable weight in the neural network learning process may include a weight at a time point at which a learning cycle starts and/or a weight variable during the learning cycle. The weight for which neural network learning is completed may include a weight for which a learning cycle has been completed. Accordingly, the data structure including the weights of the neural network may include a data structure including the weights that vary in the process of learning the neural network and/or the weights on which the learning of the neural network is completed. Therefore, it is assumed that the above-described weights and/or combinations of weights are included in the data structure including the weights of the neural network. The above-described data structure is merely an example, and the present invention is not limited thereto.

신경망의 가중치를 포함한 데이터 구조는 직렬화(serialization) 과정을 거친 후 컴퓨터 판독가능 저장 매체(예를 들어, 메모리, 하드 디스크)에 저장될 수 있다. 직렬화는 데이터 구조를 동일하거나 다른 컴퓨팅 장치에 저장하고 나중에 다시 재구성하여 사용할 수 있는 형태로 변환하는 과정일 수 있다. 컴퓨팅 장치는 데이터 구조를 직렬화하여 네트워크를 통해 데이터를 송수신할 수 있다. 직렬화된 신경망의 가중치를 포함한 데이터 구조는 역직렬화(deserialization)를 통해 동일한 컴퓨팅 장치 또는 다른 컴퓨팅 장치에서 재구성될 수 있다. 신경망의 가중치를 포함한 데이터 구조는 직렬화에 한정되는 것은 아니다. 나아가 신경망의 가중치를 포함한 데이터 구조는 컴퓨팅 장치의 자원을 최소한으로 사용하면서 연산의 효율을 높이기 위한 데이터 구조(예를 들어, 비선형 데이터 구조에서 B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree)를 포함할 수 있다. 전술한 사항은 예시일 뿐 본 발명은 이에 제한되지 않는다.The data structure including the weights of the neural network may be stored in a computer-readable storage medium (eg, memory, hard disk) after being serialized. Serialization can be the process of converting a data structure into a form that can be reconstructed and used later by storing it on the same or a different computing device. The computing device may serialize the data structure to send and receive data over the network. A data structure including weights of the serialized neural network may be reconstructed in the same computing device or in another computing device through deserialization. The data structure including the weight of the neural network is not limited to serialization. Furthermore, the data structure including the weights of the neural network is a data structure to increase computational efficiency while using the resources of the computing device to a minimum (e.g., B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree). The foregoing is merely an example, and the present invention is not limited thereto.

데이터 구조는 신경망의 하이퍼 파라미터(Hyper-parameter)를 포함할 수 있다. 그리고 신경망의 하이퍼 파라미터를 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 하이퍼 파라미터는 사용자에 의해 가변되는 변수일 수 있다. 하이퍼 파라미터는 예를 들어, 학습률(learning rate), 비용 함수(cost function), 학습 사이클 반복 횟수, 가중치 초기화(Weight initialization)(예를 들어, 가중치 초기화 대상이 되는 가중치 값의 범위 설정), Hidden Unit 개수(예를 들어, 히든 레이어의 개수, 히든 레이어의 노드 수)를 포함할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 발명은 이에 제한되지 않는다.The data structure may include hyper-parameters of the neural network. In addition, the data structure including the hyperparameters of the neural network may be stored in a computer-readable medium. The hyper parameter may be a variable variable by a user. Hyperparameters are, for example, learning rate, cost function, number of iterations of the learning cycle, weight initialization (e.g., setting the range of weight values to be initialized for weights), Hidden Unit The number (eg, the number of hidden layers, the number of nodes of the hidden layer) may be included. The above-described data structure is merely an example, and the present invention is not limited thereto.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in relation to an embodiment of the present invention may be implemented directly in hardware, as a software module executed by hardware, or by a combination thereof. A software module may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-readable recording medium well known in the art to which the present invention pertains.

본 발명의 구성 요소들은 하드웨어인 컴퓨터와 결합되어 실행되기 위해 프로그램(또는 애플리케이션)으로 구현되어 매체에 저장될 수 있다. 본 발명의 구성 요소들은 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있으며, 이와 유사하게, 실시 예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다.The components of the present invention may be implemented as a program (or application) to be executed in combination with a computer, which is hardware, and stored in a medium. Components of the present invention may be implemented as software programming or software components, and similarly, embodiments may include various algorithms implemented as data structures, processes, routines, or combinations of other programming constructs, including C, C++ , Java, assembler, etc. may be implemented in a programming or scripting language. Functional aspects may be implemented in an algorithm running on one or more processors.

본 발명의 기술 분야에서 통상의 지식을 가진 자는 여기에 개시된 실시예들과 관련하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 프로세서들, 수단들, 회로들 및 알고리즘 단계들이 전자 하드웨어, (편의를 위해, 여기에서 "소프트웨어"로 지칭되는) 다양한 형태들의 프로그램 또는 설계 코드 또는 이들 모두의 결합에 의해 구현될 수 있다는 것을 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 호환성을 명확하게 설명하기 위해, 다양한 예시적인 컴포넌트들, 블록들, 모듈들, 회로들 및 단계들이 이들의 기능과 관련하여 위에서 일반적으로 설명되었다. 이러한 기능이 하드웨어 또는 소프트웨어로서 구현되는지 여부는 특정한 애플리케이션 및 전체 시스템에 대하여 부과되는 설계 제약들에 따라 좌우된다. 본 발명의 기술 분야에서 통상의 지식을 가진 자는 각각의 특정한 애플리케이션에 대하여 다양한 방식들로 설명된 기능을 구현할 수 있으나, 이러한 구현 결정들은 본 발명의 범위를 벗어나는 것으로 해석되어서는 안 될 것이다.A person of ordinary skill in the art will recognize that the various illustrative logical blocks, modules, processors, means, circuits and algorithm steps described in connection with the embodiments disclosed herein are implemented in electronic hardware, (convenience For this purpose, it will be understood that it may be implemented by various forms of program or design code (referred to herein as "software") or a combination of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. A person skilled in the art may implement the described functionality in various ways for each specific application, but such implementation decisions should not be interpreted as a departure from the scope of the present invention.

여기서 제시된 다양한 실시예들은 방법, 장치, 또는 표준 프로그래밍 및/또는 엔지니어링 기술을 사용한 제조 물품(article)으로 구현될 수 있다. 용어 "제조 물품"은 임의의 컴퓨터-판독가능 장치로부터 액세스 가능한 컴퓨터 프로그램, 캐리어, 또는 매체(media)를 포함한다. 예를 들어, 컴퓨터-판독가능 매체는 자기 저장 장치(예를 들면, 하드 디스크, 플로피 디스크, 자기 스트립, 등), 광학 디스크(예를 들면, CD, DVD, 등), 스마트 카드, 및 플래쉬 메모리 장치(예를 들면, EEPROM, 카드, 스틱, 키 드라이브, 등)를 포함하지만, 이들로 제한되는 것은 아니다. 또한, 여기서 제시되는 다양한 저장 매체는 정보를 저장하기 위한 하나 이상의 장치 및/또는 다른 기계-판독가능한 매체를 포함한다. 용어 "기계-판독가능 매체"는 명령(들) 및/또는 데이터를 저장, 보유, 및/또는 전달할 수 있는 무선 채널 및 다양한 다른 매체를 포함하지만, 이들로 제한되는 것은 아니다.The various embodiments presented herein may be implemented as methods, apparatus, or articles of manufacture using standard programming and/or engineering techniques. The term “article of manufacture” includes a computer program, carrier, or media accessible from any computer-readable device. For example, computer-readable media include magnetic storage devices (eg, hard disks, floppy disks, magnetic strips, etc.), optical disks (eg, CDs, DVDs, etc.), smart cards, and flash memory. devices (eg, EEPROMs, cards, sticks, key drives, etc.). Also, various storage media presented herein include one or more devices and/or other machine-readable media for storing information. The term “machine-readable medium” includes, but is not limited to, wireless channels and various other media that can store, hold, and/or convey instruction(s) and/or data.

제시된 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조는 예시적인 접근들의 일례임을 이해하도록 한다. 설계 우선순위들에 기반하여, 본 발명의 범위 내에서 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조가 재배열될 수 있다는 것을 이해하도록 한다. 첨부된 방법 청구항들은 샘플 순서로 다양한 단계들의 엘리먼트들을 제공하지만 제시된 특정한 순서 또는 계층 구조에 한정되는 것을 의미하지는 않는다.It is to be understood that the specific order or hierarchy of steps in the presented processes is an example of exemplary approaches. Based on design priorities, it is to be understood that the specific order or hierarchy of steps in the processes may be rearranged within the scope of the present invention. The appended method claims present elements of the various steps in a sample order, but are not meant to be limited to the specific order or hierarchy presented.

제시된 실시예들에 대한 설명은 임의의 본 발명의 기술 분야에서 통상의 지식을 가진 자가 본 발명을 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 발명의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 발명의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 발명은 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다.The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the invention. Thus, the present invention is not to be limited to the embodiments presented herein but should be construed in the widest scope consistent with the principles and novel features presented herein.

Claims

A method performed on one or more processors of a computing device, comprising:
establishing a drug quality database;
obtaining query data;
selecting one or more similar data sets by performing a search on the drug quality database based on the query data;
performing data grouping through classification of a plurality of quality data corresponding to the one or more similar data sets and the query data; and
providing drug analysis information corresponding to the query data based on the data grouping result;
includes,
The step of constructing the drug quality database is,
acquiring a plurality of quality data (OQD, Overall Quality Data) corresponding to each of a plurality of pharmaceuticals; and
grouping each of the plurality of quality data into one or more data sets based on a Critical Quality Profile (CQP) corresponding to each of the plurality of quality data;
containing,
A method of providing responses to inquiry data based on drug quality data.

delete

According to claim 1,
The quality data is
It includes general information of drugs, profile data related to production and quality definition, and empirical data corresponding to the profile data,
The main quality profile is,
Information about the main factors that can determine the characteristics and properties of the drug, which is utilized for the search, and is configured through at least a part of the quality data,
A method of providing responses to inquiry data based on drug quality data.

According to claim 1,
The step of constructing the drug quality database is,
generating a correlation analysis model for deriving a correlation between each element by learning about a correlation between elements constituting each of the plurality of quality data through a correlation rule analysis algorithm; and
constructing the pharmaceutical quality database by performing meta-dataization on the plurality of quality data based on the correlation between the respective elements;
containing,
A method of providing responses to inquiry data based on drug quality data.

A method performed on one or more processors of a computing device, comprising:
obtaining query data;
selecting one or more similar data sets by performing a search on a drug quality database based on the query data;
performing data grouping through classification of a plurality of quality data corresponding to the one or more similar data sets and the query data; and
providing drug analysis information corresponding to the query data based on the data grouping result;
includes,
The step of selecting one or more similar data sets comprises:
selecting one or more similar data sets having a similarity greater than or equal to a critical similarity score to a major quality profile corresponding to the query data from the drug quality database;
including,
The critical similarity score is
Computing the similarity score between the main quality profiles corresponding to each of the one or more data sets, characterized in that it is calculated based on the one or more similarity scores generated corresponding to the respective data set pairs,
A method of providing responses to inquiry data based on drug quality data.

delete

5. The method of claim 4,
performing data grouping through classification of a plurality of quality data corresponding to the one or more similar data sets and the query data,
identifying one or more elements constituting each of a plurality of quality data corresponding to the one or more similar data sets; and
performing data grouping by classifying each of the plurality of quality data and the query data into one or more data groups based on the one or more elements;
containing,
A method of providing responses to inquiry data based on drug quality data.

8. The method of claim 7,
The step of providing the drug analysis information,
deriving a correlation between a data group in which the query data is classified among the one or more data groups and each of the remaining data groups by using the correlation analysis model; and
providing drug analysis information corresponding to the query data based on the correlation between the respective data groups;
containing,
A method of providing responses to inquiry data based on drug quality data.

a storage unit for storing one or more instructions; and
a processor executing one or more instructions stored in the storage unit; including,
The processor by executing the one or more instructions,
A computing device for performing the method of claim 1 or 5 .

A computer program stored in a computer-readable recording medium in combination with a computer, which is hardware, to perform the method of claim 1 or 5.