KR20240131244A

KR20240131244A - artificial intelligence-based brand value evaluation electronic device and method therof

Info

Publication number: KR20240131244A
Application number: KR1020230179942A
Authority: KR
Inventors: 이대성
Original assignee: 주식회사 쇼퍼하우스
Priority date: 2023-02-23
Filing date: 2023-12-12
Publication date: 2024-08-30

Abstract

브랜드 가치 평가 전자 장치에 있어서, 상품 정보 및 브랜드 정보를 저장하는 메모리 및 상기 상품 정보 및 상기 브랜드 정보를 처리하는 프로세서를 포함하되, 상기 프로세서는, 상기 상품 정보에 포함된 항목 정보 및 리뷰 정보를 처리한 것에 기초하여, 상기 상품 정보를 판매량 예측 모델에 입력하여 예측 판매량을 출력하고, 상기 예측 판매량 및 상기 브랜드 정보를 브랜드 가치 평가 모델에 입력하여 상기 브랜드의 적어도 하나의 가치 지표를 출력할 수 있다.In an electronic device for evaluating brand value, the device comprises a memory for storing product information and brand information, and a processor for processing the product information and the brand information, wherein the processor, based on processing item information and review information included in the product information, inputs the product information into a sales prediction model to output a predicted sales amount, and inputs the predicted sales amount and the brand information into a brand value evaluation model to output at least one value index of the brand.

Description

{Artificial intelligence-based brand value evaluation electronic device and method therof}

본 개시는 웹크롤링 기반 학습용 데이터 전처리 전자 장치 및 그 방법에 관한 것으로서, 더욱 상세하게는, 비정형 데이터를 일정한 기준에 따른 정형 데이터로 수집하고, 인공지능 모델이 학습할 수 있도록 데이터를 정제할 수 있는 웹크롤링 기반 학습용 데이터 전처리 전자 장치 및 그 방법에 관한 것이다.The present disclosure relates to an electronic device for preprocessing data for web crawling-based learning and a method thereof, and more specifically, to an electronic device for preprocessing data for web crawling-based learning and a method thereof, which can collect unstructured data into structured data according to certain criteria and refine the data so that an artificial intelligence model can learn.

브랜드 애그리게이터(Brand Agggregator)란 온라인 내 시장성과 상품성을 갖고 있는 중소상공인들 의 브랜드를 지속적으로 인수하여(or 모으거나 협력하여) 규모의 경제를 통해 성장하는 사업 모델을 갖고 있는 기업을 지칭한다. 이러한 기업은 일반적으로 인터넷을 통해 다양한 제품을 한 플랫폼에서 판매하거나 마케팅하게 된다. 이러한 플랫폼은 종종 소비자에게 편리한 쇼핑 경험을 제공하며, 다양한 제품을 비교하고 선택할 수 있는 장점을 가지고 있을 뿐만 아니라, 온/오프라인의 자체 브랜드를 가지고 있는 중소형(SME, Small and Medium Enterprise) 브랜드에게 해외 진출 및 브랜드의 성장에 대한 협력을 수행할 수 있다. 따라서, 브랜드 어그리게이터는, 중소형 브랜드들에 대한 다양한 관점의 분석이 필수적인 요소이므로 이러한 중소형(SME) 브랜드들을 다양한 관점에서 분석하기 위한 데이터가 필요한 실정이다.A brand aggregator is a company that has a business model that grows through economies of scale by continuously acquiring (or gathering or cooperating with) brands of small and medium-sized businesses with online marketability and marketability. These companies generally sell or market various products on one platform through the Internet. These platforms often provide consumers with a convenient shopping experience and have the advantage of being able to compare and select various products, and they can also cooperate with small and medium-sized (SME) brands that have their own online and offline brands for overseas expansion and brand growth. Therefore, since brand aggregators require analysis of SME brands from various perspectives, they need data to analyze these SME brands from various perspectives.

최근, 웹(Web)의 빠른 성장과 더불어 엄청난 양의 정보들이 웹을 통해 제공되고 있다. 웹이 처음 등장하였을 때에는 웹이 갖고 있는 정보의 양이 상대적으로 매우 적었기 때문에 이러한 초기의 웹은 정적 페이지(static page) 형태로 표현되었다. 이러한 정적 페이지의 URL(Uniform Resource Locator)은 대개 지속적으로 존재함으로 하이퍼링크(hyperlink)를 따라 방문이 가능했다. 이러한 특징을 갖는 웹을 서피스웹(Surface Web)이라 부른다.Recently, with the rapid growth of the Web, a huge amount of information is being provided through the Web. When the Web first appeared, the amount of information it contained was relatively small, so these early Webs were expressed in the form of static pages. The URLs (Uniform Resource Locators) of these static pages usually existed continuously, so they could be visited by following hyperlinks. The Web with these characteristics is called the Surface Web.

이러한 서피스웹을 검색하기 위해 많은 검색 엔진들이 개발되었다. 이러한 검색 엔진들은 크롤러(crawler)라 불리는 프로그램을 이용하여 하이퍼링크를 따라 웹 페이지를 방문하고 방문한 웹 페이지의 인덱스를 생성한다. 이때, 생성된 인덱스는 사용자의 질의를 만족하는 페이지를 찾는데 사용된다.Many search engines have been developed to search this surface web. These search engines use programs called crawlers to visit web pages following hyperlinks and create an index of the visited web pages. The created index is then used to find pages that satisfy the user's query.

그러나 현재의 웹은 초기의 웹보다 훨씬 많은 정보들을 갖고 있다. 이러한 정보들을 효율적으로 관리하고 검색하기 위해 현재 웹은 자신의 후단(back-end) 데이터베이스(database)에 정보들을 저장한다. 일반적으로 웹 후단의 데이터베이스로부터 정보를 가져오기 위해 사용자들은 웹 사이트의 검색 기능을 이용한다. 여기서, 웹사이트들은 이러한 검색 기능을 HTML(Hypertext Markup Language)의 <FORM> 태그를 사용하여 제공한다. HTML의 <FORM> 태그는 사용자의 입력을 웹 서버에 전달하는 역할을 한다. 즉, 사용자가 검색 폼(search form)에 찾고자하는 질의를 입력하면 웹 사이트는 이러한 질의와 연관된 정보들을 자신의 후단 데이터베이스에서 검색한다. 웹사이트는 검색된 결과들을 포함한 페이지를 동적으로 생성하고, 이 페이지를 사용자에게 보여준다.However, the current web has much more information than the early web. In order to efficiently manage and search this information, the current web stores information in its back-end database. In general, users use the search function of the website to retrieve information from the database at the back of the web. Here, websites provide this search function using the <FORM> tag of HTML (Hypertext Markup Language). The <FORM> tag of HTML plays a role in transmitting the user's input to the web server. In other words, when a user enters a query in the search form, the website searches its back-end database for information related to the query. The website dynamically generates a page containing the search results and shows this page to the user.

현재 웹크롤링을 위해 많은 검색 엔진 기업들은 엄청난 양의 분산된 컴퓨팅 자원을 사용하고 있다. 예를 들어, 구글(Google, http://www.google.com)은 수십만 대의 컴퓨터들을 전 세계에 직접 설치하여 웹크롤링에 이용하고 있다.Currently, many search engine companies use a huge amount of distributed computing resources for web crawling. For example, Google (http://www.google.com) has hundreds of thousands of computers installed directly around the world and uses them for web crawling.

따라서, 브랜드 어그리게이터는 중소형(SME) 브랜드들을 다양한 관점에서 분석하기 위한 데이터를 획득하기 위하여 이러한 웹크롤링을 이용하여 데이터를 확보하고자 노력하고 있다. 다만, 상술한 웹크롤링을 이용하여 데이터를 확보하는 것은 통일되고 일정한 기준이 없는 비정형 데이터로, 다양한 관점에서 분석하기 위한 변별력이 없는 문제가 있다. 특히, 동일한 상품이나, 다양한 상품명을 가지고 온라인(이커머스(e-commerce)) 상에서 거래가 되고 있는 상황에서, 웹 크롤링을 통해 데이터를 획득하더라도 동일한 상품이 다른 상품으로 취급되어 데이터로 저장되는 문제점이 존재한다. Accordingly, brand aggregators are trying to secure data using such web crawling in order to obtain data for analyzing small and medium-sized (SME) brands from various perspectives. However, data obtained using the above-mentioned web crawling is unstructured data without a unified and consistent standard, and there is a problem that there is no discrimination for analysis from various perspectives. In particular, in a situation where the same product or various product names are traded online (e-commerce), there is a problem that the same product is treated as different products and stored as data even if data is obtained through web crawling.

한편, 상품에 대한 리뷰와 상기 상품에 대한 평점은, 사람들의 다양한 감정과 그에 따른 주관적인 평점을 포함한다. 이에 따라서, 일부 사람이 서비스를 중시하는 경우, 해당하는 사람은 상품에 대한 만족도가 낮더라도 서비스의 품질을 높게 제공 받으면 높은 점수를 부여하는 경우가 발생한다. 즉, 이렇게 부여 받은 평점은, 상품에 대한 만족도를 우선시하는 사람이 제공받을 경우 변별력이 없는 무의미한 평점이 될 수 있다. On the other hand, reviews of products and ratings of said products include people's various emotions and subjective ratings based on them. Accordingly, when some people value service, they may give high scores even if they are not satisfied with the product, if they receive high quality service. In other words, the ratings given in this way may be meaningless and indiscriminate if given to people who prioritize satisfaction with the product.

따라서, 상품에 대한 리뷰는, 해당 하는 리뷰에 대한 분석을 통해 제품, 서비스 등과 같은 특정된 영역에 대하여 일괄적인 기준을 적용하여 객관화시킬 필요가 존재한다.Therefore, product reviews need to be objective by applying uniform standards to specific areas such as products and services through analysis of the reviews in question.

한편, 중소형(SME) 브랜드의 가치를 평가하는 것은 다양한 지표(index)를 고려할 필요가 존재한다. 이러한 브랜드의 가치를 평가하는 전자 장치의 다른 예로는 예를 들면, 한국 등록특허공보 제10-2488653호에 따르면, 분석기업의 기 설정된 분석기간 동안의 재무제표를 저장한 재무제표DB; 상기 재무제표를 기반으로 상기 분석기업의 각 년도별 순이익, 순자산, 매출액 및 영업이익을 포함하는 재무정보를 파악하는 재무 파악 모듈; 상기 재무정보를 기반으로, 상기 분석기업의 각 년도별 주식 가치의 고점인 고평가 가치와, 주식 가치의 저점인 저평가 가치를 산출하는 가치 산출 모듈; 복수의 기업을 업종, 업태 및 기업 규모를 포함하는 기업정보에 따라 기업그룹으로 분류하는 기업 분류 모듈; 순이익, 순자산, 매출액 및 영업이익 각각의 시간에 따른 변동 상황을 그래프로 파악하는 변동 파악부와, 상기 순이익, 순자산, 매출액 및 영업이익의 변동 상황에 따라 미래의 특정 시점에 대한 예상 순이익, 예상 순자산, 예상 매출액, 예상 영업이익을 포함하는 예상재무정보를 산출하는 재무 예상부 및, 상기 예상재무정보를 기반으로 상기 분석기업의 미래의 특정 시점에 대한 고평가 가치와 저평가 가치를 산출하는 가치 예상부와, 상기 순이익, 매출액 및 영업이익 각각에 대한 그래프의 상승, 횡보, 하강을 포함하는 변동 추세를 비교하는 추세 비교부 및, 상기 추세 비교부의 비교 결과에 따라 상기 미래의 특정 시점에 대한 예상 순이익을 보정 처리하는 예상 순이익 보정부를 포함하는 가치 예상 모듈; 시간에 따른 상기 고평가 가치 및 저평가 가치의 변화는 물론 분석기업의 미래의 특정 시점에 대한 고평가 가치와 저평가 가치를 차트로 표시하는 차트 표시 모듈;을 포함하고, 상기 예상 순이익 보정부는, 상기 추세 비교부의 비교 결과에 상기 분석기업이 속한 기업그룹별로 차등 설정된 기업 가중치를 더 반영하여 상기 미래의 특정 시점에 대한 예상 순이익을 다음의 수학식 1을 통해 산출된 보정수치의 고저에 따라 보정 처리하는 것을 특징으로 하는 시스템이 제시된 바 있다.Meanwhile, evaluating the value of small and medium-sized (SME) brands requires consideration of various indices. Other examples of electronic devices for evaluating the value of such brands include, for example, according to Korean Patent Publication No. 10-2488653, a financial statement DB storing financial statements of an analyzed company for a preset analysis period; a financial understanding module for understanding financial information including net income, net assets, sales, and operating profit for each year of the analyzed company based on the financial statements; a value calculation module for calculating an overvalued value, which is the highest point of the stock value, and an undervalued value, which is the lowest point of the stock value, for each year of the analyzed company based on the financial information; a corporate classification module for classifying multiple companies into corporate groups based on corporate information including industry, business type, and corporate size; A value prediction module including a change understanding unit that graphically understands changes in net income, net assets, sales, and operating profit over time, a financial prediction unit that calculates projected financial information including projected net income, projected net assets, projected sales, and projected operating profit for a specific point in the future based on the changes in the net income, net assets, sales, and operating profit, a value prediction unit that calculates an overvalued value and an undervalued value of the analyzed company for a specific point in the future based on the projected financial information, a trend comparison unit that compares change trends including rising, sideways, and falling graphs for each of the net income, sales, and operating profit, and a projected net income correction unit that corrects and processes the projected net income for the specific point in the future based on the comparison results of the trend comparison unit; A system has been presented which includes a chart display module that displays changes in the overvalued and undervalued values over time as well as the overvalued and undervalued values for a specific point in the future of the analyzed company in a chart; and wherein the expected net income correction unit further reflects a corporate weight differentially set for each corporate group to which the analyzed company belongs in the comparison result of the trend comparison unit, and corrects the expected net income for the specific point in the future according to the high and low of the correction value calculated through the following mathematical expression 1.

즉, 상기한 바와 같이, 종래의 브랜드의 가치를 평가하는 여러가지 기술내용들이 제시된 바 있으나, 상기의 종래 기술들은 다음과 같은 문제점이 있는 것이었다.That is, as mentioned above, various technologies for evaluating the value of conventional brands have been presented, but the above conventional technologies had the following problems.

단순히 중소형(SME) 브랜드의 재무제표만을 기초로 가치를 평가하는 것은, 재무제표를 제대로 관리하는 중소형(SME) 브랜드만이 가능하며, 제대로 재무제표를 관리하지 않은 브랜드의 경우, 정확한 브랜드의 가치를 평가하기에 매우 어려운 문제점이 있으며, 단순히 재무적 기준만으로는 브랜드의 가치를 대표할 수 없으므로, 정확한 브랜드의 가치를 판단할 수 없는 문제점이 있다.Simply evaluating the value of a small and medium-sized (SME) brand based solely on its financial statements is only possible for SME brands that properly manage their financial statements. For brands that do not properly manage their financial statements, it is very difficult to accurately evaluate the value of the brand. In addition, since the value of a brand cannot be represented solely by financial criteria, there is a problem in that the accurate value of the brand cannot be determined.

또한, 기존의 가치 평가는, 증소형 기업에 집중하여, 기업이 가지고 있는 자산, 부채, 인적자원, 브랜드 매출 등등 재무적, 물적 및 인적 자원만을 고려하고 있는 반면 브랜드 애그리게이터는 기업의 가지고 있는 브랜드의 가치만을 고려하고 있었다.In addition, while existing valuations focus on small and medium-sized companies and only consider financial, physical and human resources such as assets, liabilities, human resources and brand sales of the company, brand aggregators only consider the value of the company's brand.

이에 따라, 브랜드의 금융 데이터뿐만 아니라, 비금융 빅데이터를 기반으로 브랜드가 가지고 있는 가치를 평가하는 기술에 대한 개발이 필요한 실정이다.Accordingly, there is a need to develop technology to evaluate the value of a brand based on not only the brand's financial data but also non-financial big data.

이러한 비금융 데이터를 기반으로 브랜드의 가치를 평가하는 것은, 이커머스의 판매 상품의 세부 카테고리는 대략 2만 여개로써, 각각의 세부 카테고리 별 상품의 갖는 속성이 다르며, 다양한 고객들의 상품에 대한 평가를 객관적으로 분리하는 것은 현실적으로 어려운 문제점이 있다.Evaluating the value of a brand based on such non-financial data is problematic in that there are approximately 20,000 detailed categories of e-commerce products sold, each with different attributes, and it is realistically difficult to objectively separate the evaluations of various customers on products.

또한, 각각의 상품에 대한 일관적 기준 및 고객들의 개인적인 평가를 기반으로 작성된 평점으로 평가하는 것은, 변별력이 없는 문제점이 있다.In addition, there is a problem of lack of discrimination when evaluating each product with a consistent standard and rating based on customers' personal evaluations.

개시된 발명의 일 측면은 통일되고 일정한 기준이 존재하는 정형적 데이터를 획득하여 인공지능 모델을 학습시키기에 적합한 웹크롤링 기반 학습용 데이터 전처리 전자 장치 및 그 방법을 제공하고자 한다.One aspect of the disclosed invention is to provide an electronic device and method for preprocessing data for web crawling-based learning suitable for training an artificial intelligence model by obtaining structured data having a unified and consistent standard.

또한, 개시된 발명의 일 측면은 상품의 리뷰에 대한 분석을 통해 제품, 서비스 등과 같은 특정 영역에 대한 감정 분석을 통해 평점을 객관화시킬 수 있는 웹크롤링 기반 리뷰 분석 전자 장치 및 그 방법을 제공하고자 한다.In addition, one aspect of the disclosed invention is to provide a web crawling-based review analysis electronic device and method capable of objectifying ratings through sentiment analysis for a specific area such as a product, service, etc. through analysis of product reviews.

또한, 개시된 발명의 일 측면은 재무적 기준 외의 평가와 인지도, 예상 매출액 등을 고려하여 브랜드의 종합적인 가치를 평가할 수 있는 인공지능 기반 브랜드 가치 평가 전자 장치 및 그 방법을 제공하고자 한다.In addition, one aspect of the disclosed invention is to provide an artificial intelligence-based brand value evaluation electronic device and method capable of evaluating the comprehensive value of a brand by considering evaluations other than financial criteria, such as awareness and expected sales.

개시된 발명의 일 측면에 따른 브랜드 가치 평가 전자 장치에 있어서, 전자 장치는 상품 정보, 리뷰 정보 및 브랜드 정보를 저장하는 메모리 및 상기 상품 정보, 상기 리뷰 정보 및 상기 재무 정보를 처리하는 프로세서를 포함하되, 상기 프로세서는, 상기 상품 정보를 처리한 것에 기초하여, 상기 상품 정보를 판매량 예측 모델에 입력하여 예측 판매량을 출력하고, 상기 예측 판매량, 상기 리뷰 정보 및 상기 브랜드 정보를 브랜드 가치 평가 모델에 입력하여 상기 브랜드의 적어도 하나의 가치 지표를 출력할 수 있다.In an electronic device for brand value assessment according to one aspect of the disclosed invention, the electronic device includes a memory for storing product information, review information and brand information, and a processor for processing the product information, the review information and the financial information, wherein the processor, based on the processed product information, inputs the product information into a sales volume prediction model to output a predicted sales volume, and inputs the predicted sales volume, the review information and the brand information into a brand value assessment model to output at least one value index of the brand.

또한, 상기 프로세서는, 상기 상품 정보를 제1 판매량 예측 모델에 입력하여 제1 예측 판매량을 출력하고, 상기 상품 정보를 제2 판매량 예측 모델에 입력하여 제2 예측 판매량을 출력하고, 상기 제1 예측 판매량 및 상기 제2 예측 판매량에 기초하여, 예측 판매량을 출력할 수 있다.In addition, the processor can input the product information into a first sales prediction model to output a first predicted sales amount, input the product information into a second sales prediction model to output a second predicted sales amount, and output a predicted sales amount based on the first predicted sales amount and the second predicted sales amount.

또한, 상기 제1 판매량 예측 모델은, TCN(Temporal Convolutional Network) 알고리즘을 기초로 학습된 모델이고, 상기 제2 판매량 예측 모델은, LSTM(Long Short Term Memory) 알고리즘을 기초로 학습된 모델일 수 있다.In addition, the first sales prediction model may be a model learned based on a Temporal Convolutional Network (TCN) algorithm, and the second sales prediction model may be a model learned based on a Long Short Term Memory (LSTM) algorithm.

또한, 상기 예측 판매량은, 예측 일자 별 예측 판매량에 관한 시계열 데이터이고, 상기 예측 판매량을 출력하는 것은, 상기 제1 예측 판매량 및 상기 제2 예측 판매량에 가중치를 부여하고, 상기 가중치가 부여된 제1 예측 판매량 및 제2 예측 판매량을 기초로 예측 판매량을 출력하는 것일 수 있다.In addition, the predicted sales volume is time series data regarding the predicted sales volume by predicted date, and outputting the predicted sales volume may include assigning weights to the first predicted sales volume and the second predicted sales volume, and outputting the predicted sales volume based on the first predicted sales volume and the second predicted sales volume to which the weights are assigned.

또한, 상기 프로세서는, 예측 기간에 반비례하여 상기 제1 예측 판매량의 가중치를 부여하고, 상기 예측 기간에 비례하여 상기 제2 예측 판매량의 가중치를 부여할 수 있다.Additionally, the processor may assign a weight to the first predicted sales amount inversely proportional to the prediction period, and may assign a weight to the second predicted sales amount in proportional to the prediction period.

또한, 상기 판매량 예측 모델은, 카테고리에 대응되는 복수의 상품명과 상기 상품명 각각에 대응되는 제1 항목 정보를 기초로 학습될 수 있다.In addition, the above sales volume prediction model can be learned based on multiple product names corresponding to a category and first item information corresponding to each of the product names.

또한, 상기 판매량 예측 모델은, 임베딩 레이더(embedding layer)를 더 포함하여 학습될 수 있다.Additionally, the above sales prediction model can be trained by further including an embedding layer.

또한, 상기 프로세서는, 통합 그라디언트(integrated Gradient)를 통해 상기 판매량 예측 모델에 입력되는 상기 상품 정보에 포함된 데이터의 기여도를 식별하고, 상기 기여도가 미리 정해진 값보다 큰 데이터를 상기 사용자 단말로 송신할 수 있다.In addition, the processor can identify the contribution of data included in the product information input to the sales prediction model through an integrated gradient, and transmit data having a contribution greater than a predetermined value to the user terminal.

또한, 상기 브랜드 가치 평가 모델은, 상기 예측 판매량, 상기 리뷰 정보 및 상기 브랜드 정보를 입력으로, 리뷰 지표(index) 점수, 평점 지표 점수, 순위 지표 점수 및 수익 지표 점수를 출력하도록 학습될 수 있다.In addition, the brand value evaluation model can be trained to output a review index score, a rating index score, a ranking index score, and a profit index score by taking the predicted sales volume, the review information, and the brand information as inputs.

한편, 개시된 발명의 일 측면에 따른 브랜드 가치 평가 방법에 있어서, 방법은, 상품 정보 및 브랜드 정보를 저장하는 단계, 상기 상품 정보에 포함된 항목 정보 및 리뷰 정보를 처리한 것에 기초하여 상기 상품 정보를 판매량 예측 모델에 입력하여 예측 판매량을 출력하는 단계 및 상기 예측 판매량 및 상기 브랜드 정보를 브랜드 가치 평가 모델에 입력하여 상기 브랜드의 적어도 하나의 가치 지표를 출력하는 단계를 포함할 수 있다.Meanwhile, in a brand value assessment method according to one aspect of the disclosed invention, the method may include a step of storing product information and brand information, a step of inputting the product information into a sales volume prediction model and outputting a predicted sales volume based on processing item information and review information included in the product information, and a step of inputting the predicted sales volume and the brand information into a brand value assessment model and outputting at least one value indicator of the brand.

개시된 발명의 일 측면에 따르면 통일되고 일정한 기준이 존재하는 정형적 데이터를 획득하여 인공지능 모델을 학습시키기에 적합한 웹크롤링 기반 학습용 데이터 전처리 전자 장치 및 그 방법을 제공할 수 있다.According to one aspect of the disclosed invention, a data preprocessing electronic device and method for web crawling-based learning suitable for training an artificial intelligence model by obtaining structured data having a unified and consistent standard can be provided.

또한, 개시된 발명의 일 측면에 따르면 상품의 리뷰에 대한 분석을 통해 제품, 서비스, 카테고리별 속성 등과 같은 특정 영역을 통해 평점을 객관화하여 미래 매출 예측에 영향도를 도출해 낼 수 있는 웹크롤링 기반 리뷰 분석 전자 장치 및 그 방법을 제공할 수 있다.In addition, according to one aspect of the disclosed invention, a web crawling-based review analysis electronic device and method can be provided that can derive influence on predicting future sales by objectifying ratings through specific areas such as product, service, category-specific attributes, etc. through analysis of product reviews.

또한, 개시된 발명의 일 측면에 따르면, 비금융 빅데이터를 활용하여 재무적 기준 외의 평가와 인지도, 예상 매출액 등을 고려하여 브랜드의 종합적인 가치를 평가할 수 있는 인공지능 기반 브랜드 가치 평가 전자 장치 및 그 방법을 제공할 수 있다.In addition, according to one aspect of the disclosed invention, an artificial intelligence-based brand value evaluation electronic device and method can be provided that can evaluate the comprehensive value of a brand by considering evaluations other than financial criteria, such as recognition and expected sales, by utilizing non-financial big data.

도 1은 일 실시예에 의한 인공지능 기반 브랜드 가치평가 시스템을 설명하기 위한 개념도이다.
도 2는 일 실시예에 의한 전자 장치의 구성을 나타낸 블록도이다.
도 3은 일 실시예에 의한 전자 장치가 기저장한 상품 정보를 설명하기 위한 도면이다.
도 4는 일 실시예에 의한 전자 장치가 저장하는 상품 정보의 시계열적 데이터를 설명하기 위한 개념도이다.
도 5는 일 실시예에 의한 전자 장치가 수집하는 수집 정보를 설명하기 위한 도면이다.
도 6은 일 실시예에 의한 전자 장치가 수집하는 수집 정보를 설명하기 위한 도면이다.
도 7은 일 실시예에 의한 전자 장치가 수집하는 수집 정보의 결측값을 설명하기 위한 도면이다.
도 8은 일 실시예에 의한 전자 장치의 판매량 예측 모델을 설명하기 위한 도면이다.
도 9는 일 실시예에 의한 전자 장치가 수집하는 리뷰 정보를 설명하기 위한 도면이다.
도 10은 일 실시예에 의한 전자 장치의 리뷰 정보의 대표 문장을 식별하는 것을 설명하기 위한 도면이다.
도 11은 일 실시예에 의한 전자 장치의 평점 추출 모델을 학습시키기 위한 학습 데이터를 설명하기 위한 도면이다.
도 12는 일 실시예에 의한 전자 장치의 평점 추출에 관한 실시예를 설명하기 위한 도면이다.
도 13은 일 실시예에 의한 학습용 데이터 전처리 방법을 설명하기 위한 순서도이다.
도 14는 일 실시예에 의한 학습용 데이터 전처리 방법을 설명하기 위한 순서도이다.
도 15는 일 실시예에 의한 리뷰 분석 방법을 설명하기 위한 순서도이다.
도 16은 일 실시예에 의한 브랜드 가치 평가 방법을 설명하기 위한 순서도이다.
도 17은 일 실시예에 의한 브랜드 가치 평가 방법을 설명하기 위한 순서도이다.Figure 1 is a conceptual diagram explaining an artificial intelligence-based brand value assessment system according to one embodiment.
Figure 2 is a block diagram showing the configuration of an electronic device according to one embodiment.
FIG. 3 is a drawing for explaining product information stored in an electronic device according to one embodiment.
FIG. 4 is a conceptual diagram for explaining time-series data of product information stored by an electronic device according to one embodiment.
FIG. 5 is a diagram for explaining collection information collected by an electronic device according to one embodiment.
FIG. 6 is a drawing for explaining collection information collected by an electronic device according to one embodiment.
FIG. 7 is a diagram for explaining missing values of collected information collected by an electronic device according to one embodiment.
FIG. 8 is a diagram for explaining a sales volume prediction model of an electronic device according to one embodiment.
FIG. 9 is a diagram for explaining review information collected by an electronic device according to one embodiment.
FIG. 10 is a drawing for explaining identification of representative sentences of review information of an electronic device according to one embodiment.
FIG. 11 is a diagram for explaining learning data for learning a rating extraction model of an electronic device according to one embodiment.
FIG. 12 is a drawing for explaining an embodiment of rating extraction of an electronic device according to one embodiment.
Figure 13 is a flowchart for explaining a method for preprocessing learning data according to one embodiment.
Figure 14 is a flowchart for explaining a method for preprocessing learning data according to one embodiment.
Figure 15 is a flowchart for explaining a review analysis method according to one embodiment.
Figure 16 is a flowchart for explaining a brand value evaluation method according to one embodiment.
Figure 17 is a flowchart for explaining a brand value evaluation method according to one embodiment.

명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 명세서가 실시예들의 모든 요소들을 설명하는 것은 아니며, 개시된 발명이 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 '부, 모듈, 부재, 블록'이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다.Like reference numerals refer to like elements throughout the specification. This specification does not describe all elements of the embodiments, and any content that is general in the technical field to which the disclosed invention belongs or that overlaps between the embodiments is omitted. The terms 'part, module, element, block' used in the specification can be implemented in software or hardware, and according to the embodiments, a plurality of 'parts, modules, elements, blocks' can be implemented as a single element, or a single 'part, module, element, block' can include a plurality of elements.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 간접적으로 연결되어 있는 경우를 포함하고, 간접적인 연결은 무선 통신망을 통해 연결되는 것을 포함한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only a direct connection but also an indirect connection, and an indirect connection includes a connection via a wireless communications network.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, when a part is said to "include" a component, this does not mean that it excludes other components, but rather that it may include other components, unless otherwise specifically stated.

명세서 전체에서, 어떤 부재가 다른 부재 "상에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the specification, when it is said that an element is "on" another element, this includes not only cases where the element is in contact with the other element, but also cases where there is another element between the two elements.

제 1, 제 2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다. The terms first, second, etc. are used to distinguish one component from another, and the components are not limited by the aforementioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly indicates otherwise.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다.The identification codes in each step are used for convenience of explanation and do not describe the order of each step. Each step may be performed in a different order than specified unless the context clearly indicates a specific order.

이하 첨부된 도면들을 참고하여 개시된 발명의 작용 원리 및 실시예들에 대해 설명한다.The working principle and embodiments of the disclosed invention will be described with reference to the attached drawings below.

브랜드 어그리게이터는 중소형(SME) 브랜드들을 다양한 관점에서 분석하기 위한 데이터를 획득하기 위하여 이러한 웹크롤링을 이용하여 데이터를 확보하고자 노력하고 있다. 다만, 상술한 웹크롤링을 이용하여 데이터를 확보하는 것은 통일되고 일정한 기준이 없는 비정형 데이터로, 다양한 관점에서 분석하기 위한 변별력이 없는 문제가 있다. 특히, 동일한 상품이나, 다양한 상품명을 가지고 온라인(이커머스(e-commerce)) 상에서 거래가 되고 있는 상황에서, 웹 크롤링을 통해 데이터를 획득하더라도 동일한 상품이 다른 상품으로 취급되어 데이터로 저장되는 문제점이 존재한다.Brand aggregators are trying to secure data using web crawling to obtain data for analyzing small and medium-sized (SME) brands from various perspectives. However, data obtained using the above-mentioned web crawling is unstructured data without a unified and consistent standard, and there is a problem that there is no discrimination for analysis from various perspectives. In particular, in a situation where the same product or various product names are traded online (e-commerce), there is a problem that the same product is treated as different products and stored as data even if data is obtained through web crawling.

본 발명은, 웹크롤링을 통해 웹페이지로부터 인공지능 모델을 학습시키기에 적합한 형태로 전처리를 수행함으로써, 다양한 관점에서의 데이터를 분석할 수 있도록 정형 데이터를 획득함으로써, 비정형 데이터로부터 변별력이 없는 데이터 분석이 수행되는 문제점을 해결하기 위한 것일 수 있다.The present invention may be intended to solve the problem of performing data analysis without discrimination from unstructured data by obtaining structured data so that data can be analyzed from various viewpoints by performing preprocessing in a form suitable for learning an artificial intelligence model from a web page through web crawling.

또한, 상품에 대한 리뷰와 상기 상품에 대한 평점은, 사람들의 다양한 감정과 그에 따른 주관적인 평점을 포함한다. 이에 따라서, 일부 사람이 서비스를 중시하는 경우, 해당하는 사람은 상품에 대한 만족도가 낮더라도 서비스의 품질을 높게 제공 받으면 높은 점수를 부여하는 경우가 발생한다. 즉, 이렇게 부여 받은 평점은, 상품에 대한 만족도를 우선시하는 사람이 제공받을 경우 변별력이 없는 무의미한 평점이 될 수 있다.In addition, reviews of products and ratings of said products include people's various emotions and subjective ratings based on them. Accordingly, when some people value service, they may give high scores even if they are not satisfied with the product, if they receive high quality service. In other words, the ratings given in this way may be meaningless and indiscriminate if given to a person who prioritizes satisfaction with the product.

본 발명은, 웹크롤링을 통해 웹페이지로부터 획득한 리뷰에 관한 정보를 분석하여, 상기 리뷰에 포함된 제품에 대한 감정, 서비스에 대한 감정, 전체적인 감정 등으로 추출함으로써, 특정 사람에게만 유효하거나 무용한 평점을 보다 객관화시킨 기준을 통해 평점을 수정하여 보다 객관적인 평점 데이터를 획득하여 변별력이 없는 무의미한 평점에 대한 문제점을 해소하기 위한 것일 수 있다.The present invention may be intended to solve the problem of meaningless ratings without discrimination by analyzing information about reviews acquired from web pages through web crawling, extracting emotions about products, emotions about services, and overall emotions included in the reviews, thereby revising ratings through more objective criteria that make ratings that are only valid or useless to specific people more objective, and thereby obtaining more objective rating data.

또한, 단순히 중소형(SME) 브랜드의 재무제표만을 기초로 브랜드의 가치를 평가하는 것은, 재무제표를 제대로 관리하는 중소형(SME) 브랜드만이 가능하며, 제대로 재무제표를 관리하지 않은 브랜드의 경우, 정확한 브랜드의 가치를 평가하기에 매우 어려운 문제점이 있으며, 단순히 재무적 기준만으로는 브랜드의 가치를 대표할 수 없으므로, 정확한 브랜드의 가치를 판단할 수 없는 문제점이 있다.In addition, simply evaluating the value of a brand based solely on the financial statements of small and medium-sized (SME) brands is only possible for SME brands that properly manage their financial statements. In the case of brands that do not properly manage their financial statements, it is very difficult to accurately evaluate the value of the brand. In addition, since the value of a brand cannot be represented solely by financial criteria, there is a problem in that the accurate value of the brand cannot be determined.

본 발명은 보다 객관화된 리뷰 데이터 및 시계열적 판매량 데이터를 통해 예측한 예측 판매량 등 다양한 지표들을 통해 다방면에 대한 브랜드의 가치 평가를 수행하여 상술한 문제점을 해결하기 위한 것일 수 있다.The present invention may be intended to solve the above-described problem by performing a multi-faceted brand value evaluation through various indicators such as predicted sales volume predicted through more objective review data and time-series sales volume data.

도 1은, 일 실시예에 의한 인공지능 기반 브랜드 가치평가 시스템(1000)(이하, 설명의 편의상 '본 시스템'이라 명명함)을 설명하기 위한 개념도이다.Figure 1 is a conceptual diagram for explaining an artificial intelligence-based brand value assessment system (1000) (hereinafter, referred to as “this system” for convenience of explanation) according to one embodiment.

도 1을 참조하면, 본 개시의 일 실시예에 따른 본 시스템(1000)은 서버(200)로부터 웹페이지 데이터를 수신하고, 상기 웹페이지 데이터를 처리하는 전자 장치(100)(이하 설명의 편의상 '본 장치'라 명명함), 웹페이지 데이터를 상기 본 전자 장치(100)로 송신하는 서버(200) 및 서버(200) 및/또는 본 장치(100)로부터 데이터를 송/수신할 수 있는 사용자 단말(300)를 포함할 수 있다.Referring to FIG. 1, the system (1000) according to one embodiment of the present disclosure may include an electronic device (100) (hereinafter referred to as “the device” for convenience of explanation) that receives webpage data from a server (200) and processes the webpage data, a server (200) that transmits the webpage data to the electronic device (100), and a user terminal (300) that can transmit/receive data from the server (200) and/or the device (100).

본 개시의 일 실시예에 의한 본 시스템(1000)은, 서버(200)에 존재하는 데이터를 획득하고 처리할 수 있다. 즉, 본 시스템(1000)은 서버(200) 내에 계속적으로 및/또는 주기적으로 업데이트되는 데이터를 획득할 수 있도록, 서버(200)에 데이터를 요청하고, 요청한 데이터를 획득하여 처리할 수 있다.The system (1000) according to one embodiment of the present disclosure can obtain and process data existing in the server (200). That is, the system (1000) can request data from the server (200) and obtain and process the requested data so as to obtain data that is continuously and/or periodically updated in the server (200).

한편, 본 시스템(1000)은 적어도 하나의 상품명 및 상기 상품명에 대응되는 제1 항목 정보를 포함하는 상품 정보를 저장할 수 있다. 즉, 상품 정보는 상품명에 대응되는 데이터들로 이루어지는 메타 데이터 형식으로 이루어질 수 있으며, 시계열적 데이터로써, 시스템(1000)이 서버(200)로부터 획득된 데이터가 업데이트되는 시간에 따라 누적된 데이터 형식을 가질 수 있다. 한편, 여기에서 상품명은, 예시적으로, 상품 ID로 명명될 수 있다. 즉, 상품명은 특정 상품에 대한 고유 식별 코드로 이해할 수 있다.Meanwhile, the system (1000) can store product information including at least one product name and first item information corresponding to the product name. That is, the product information can be in the form of metadata consisting of data corresponding to the product name, and as time-series data, can have a data form accumulated according to the time at which the data acquired by the system (1000) from the server (200) is updated. Meanwhile, the product name here can be named, for example, as a product ID. That is, the product name can be understood as a unique identification code for a specific product.

한편, 상품 정보는, 이하에서 자세히 서술하겠지만, 적어도 하나의 상품명과 상기 상품명에 대응되는 제1 항목 정보를 포함할 수 있다. 보다 상세하게, 상품 정보는 복수의 카테고리에 따라 그룹이 나누어져 적어도 하나의 그룹에 관한 정보를 가질 수 있다. 예시적으로, 상품 정보는 견과류라는 카테고리에 땅콩, 밤, 호두, 아몬드, 잣, 은행, 피스타치오, 케슈넛 등과 같은 복수의 상품을 포함할 수 있다. 즉, 상품 정보는 적어도 하나의 카테고리에 대한 그룹과 상기 그룹에 속하는 상품들에 대한 상품명과 상기 상품명에 대응되는 제1 항목 정보를 포함할 수 있다.Meanwhile, the product information may include at least one product name and first item information corresponding to the product name, as will be described in detail below. More specifically, the product information may be divided into groups according to multiple categories and may have information about at least one group. For example, the product information may include multiple products such as peanuts, chestnuts, walnuts, almonds, pine nuts, ginkgo nuts, pistachios, and cashew nuts in the category of nuts. That is, the product information may include a group for at least one category, product names for products belonging to the group, and first item information corresponding to the product name.

예시적으로, 제1 항목 정보는 유통되고 있는 상품명에 대한 판매량(Revenure), 누적 리뷰수, 순위(Rank), 가격, 링크(link), 평점, 배송비, 카테고리, 브랜드명 또는 제조사명, 찜하기 수, 출시일 등 중 적어도 하나를 포함할 수 있다. 한편, 여기에서, 상품명에 대응되는 제1 항목 정보 또는 제2 항목 정보에 포함된 카테고리는, 예를 들어, 상기 상품명이 속하는 카테고리에 관한 식별값을 의미할 수 있다. 예시적으로, 제1 상품이 제1 카테고리에 속하는 경우, 상기 제1 상품에 대한 제1 항목 정보는 제1 카테고리라는 속성(식별값 또는 그룹)을 포함할 수 있음을 이해할 수 있다. 이에 따라서, 상품명 및 상기 상품명에 대응되는 제1 항목 정보 또는 제2 항목 정보에 포함된 카테고리 속성(식별값 또는 구룹)을 통해 상기 상품명이 속하는 카테고리 그룹을 식별할 수 있다.For example, the first item information may include at least one of the sales volume, cumulative number of reviews, rank, price, link, rating, shipping fee, category, brand name or manufacturer name, number of likes, release date, etc. for the product name being distributed. Meanwhile, here, the category included in the first item information or the second item information corresponding to the product name may mean, for example, an identification value regarding the category to which the product name belongs. For example, if the first product belongs to the first category, it can be understood that the first item information for the first product may include an attribute (identification value or group) called the first category. Accordingly, the category group to which the product name belongs can be identified through the product name and the category attribute (identification value or group) included in the first item information or the second item information corresponding to the product name.

본 개시의 일 실시예에 의한 본 시스템(1000)은 획득된 데이터를 처리한 것에 기초하여, 적어도 하나의 상품명과 상기 상품명에 대응되는 제2 항목 정보를 포함하는 수집 정보를 식별할 수 있다. 한편, 제2 항목 정보는 예를 들어, 상기 상품 정보의 제1 항목 정보와 구분되기 위하여 명명한 것일 뿐, 이에 한정되는 것은 아니고 제1 항목 정보와 동일한 정보들에 대한 항목을 포함할 수 있다. 즉, 제1 항목 정보 및 제2 항목 정보는 항목 정보로 명명될 수 있다.The system (1000) according to one embodiment of the present disclosure can identify collected information including at least one product name and second item information corresponding to the product name based on processing the acquired data. Meanwhile, the second item information is named only to be distinguished from the first item information of the product information, for example, and is not limited thereto, and can include items for the same information as the first item information. That is, the first item information and the second item information can be named as item information.

예시적으로, 제1 항목 정보가 판매량, 누적 리뷰수, 가격에 대한 항목을 포함하고 있는 경우, 제2 항목 정보는 상기 제1 항모 정보와 동일하게 판매량, 누적 리뷰수, 가격에 대한 항목을 포함할 수 있다. 한편, 보다 상세하게, 상품 정보는 예를 들어, 수집 정보가 누적된 시계열적 데이터일 수 있다. 즉, 수집 정보가 수집된 시기에 따라 누적되어 저장된 정보가 상품 정보일 수 있다.For example, if the first item information includes items about sales volume, cumulative number of reviews, and price, the second item information may include items about sales volume, cumulative number of reviews, and price, similar to the first carrier information. Meanwhile, more specifically, the product information may be, for example, time-series data in which the collection information is accumulated. In other words, information accumulated and stored according to the time when the collection information is collected may be the product information.

본 개시의 일 실시예에 의한 본 시스템(1000)은, 식별된 수집 정보를 처리한 것에 기초하여, 중복되는 상품명에 대한 데이터를 삭제할 수 있다. 즉, 이는 웹페이지 데이터 내의 중복되는 상품명에 대한 데이터를 필터링을 통해, 중복되는 데이터의 누적 데이터 생성을 방지하기 위함일 수 있다.The system (1000) according to one embodiment of the present disclosure can delete data on duplicate product names based on processing of identified collected information. That is, this can be used to prevent the cumulative generation of duplicate data by filtering data on duplicate product names within web page data.

보다 상세하게, 웹페이지 데이터 내에는, 복수의 상품명과 상기 상품명에 대응되는 항목 정보를 포함할 수 있다. 다만, 웹페이지 데이터 내에 판매자의 광고 등록 등 다양한 이유로 인해 동일한 상품명에 대한 항목 정보가 복수개 포함될 수 있다. 이에 따라서, 본 시스템(1000)은, 웹페이지 데이터를 처리한 것에 기초하여 식별한 수집 정보를 처리한 것에 기초하여, 수집 정보 내의 동일한 상품명에 대한 항목 정보를 삭제하여, 중복되는 상품명과 상기 상품명에 대응되는 항목 정보를 삭제함으로써, 중복 데이터를 처리할 수 있다.More specifically, web page data may include multiple product names and item information corresponding to the product names. However, the web page data may include multiple item information for the same product name due to various reasons, such as a seller's advertisement registration. Accordingly, the system (1000) may process duplicate data by deleting item information for the same product name in the collected information based on processing the collected information identified based on processing the web page data, thereby deleting duplicate product names and item information corresponding to the product names.

예시적으로, 웹페이지 데이터 중 제1 페이지 내에 제1 상품과 상기 제1 상품에 대응되는 항목 정보를 포함하고, 상기 웹페이지 데이터 중 제2 페이지 내에 제1 상품과 상기 제1 상품에 대응되는 항목 정보를 포함하는 경우에 있어서, 본 시스템(1000)은 웹페이지 데이터를 처리한 것에 기초하여, 두 개의 제1 상품 및 상기 제1 상품에 대응되는 항목 정보를 포함하는 수집 정보를 식별할 수 있다. 이에 따라서, 본 시스템(1000)은, 두 개의 제1 상품명과 상기 상품명에 대응되는 항목 정보 중 하나의 제1 상품명과 상기 상품명에 대응되는 항목 정보를 삭제할 수 있다. 이에 따라서, 중복되는 데이터를 필터링 및/또는 삭제할 수 있다.For example, in a case where a first product and item information corresponding to the first product are included in a first page of web page data, and a first product and item information corresponding to the first product are included in a second page of the web page data, the system (1000) may identify collection information including two first products and item information corresponding to the first product based on the web page data processed. Accordingly, the system (1000) may delete one of the first product names and the item information corresponding to the product names among the two first product names and the item information corresponding to the product names. Accordingly, duplicate data may be filtered and/or deleted.

또한, 본 시스템(1000)은, 수집 정보에 포함된 리뷰 정보 중 작성자 ID를 기초로 중복된 데이터를 삭제하여, 중복 리뷰 정보를 정리할 수 있음을 상술한 설명에 기초하여 이해할 수 있다. 예시적으로, 본 시스템(1000)은, 특정 상품명에 대응되어 항목 정보 및 리뷰 정보를 포함하는 상품 정보를 처리한 것에 기초하여, 상기 리뷰 정보에 포함된 복수의 작성자 ID 중 중복되는 작성자 ID에 대응되는 리뷰 데이터를 삭제할 수 있다. 다만, 이에 한정되는 것은 아니다.In addition, it can be understood based on the above description that the present system (1000) can organize duplicate review information by deleting duplicate data based on the author ID among the review information included in the collected information. For example, the present system (1000) can delete review data corresponding to a duplicate author ID among multiple author IDs included in the review information based on processing product information including item information and review information corresponding to a specific product name. However, the present invention is not limited thereto.

본 개시의 일 실시예에 의한 본 시스템(1000)은 기저장된 상품 정보 및 수집 정보를 처리한 것에 기초하여, 상품 정보 및 수집 정보 간의 동일 상품 여부를 판단할 수 있다. 보다 상세하게, 상품 정보 및 수집 정보 간의 동일 상품 여부를 판단한다는 것은, 상품 정보에 포함된 적어도 하나의 상품명과 상기 수집 정보에 포함된 적어도 하나의 상품명을 처리한 것에 기초하여, 동일 상품 여부를 판단하는 것일 수 있다.The system (1000) according to one embodiment of the present disclosure can determine whether the product information and the collection information are the same product based on processing the pre-stored product information and the collection information. More specifically, determining whether the product information and the collection information are the same product can be determining whether the product information and the collection information are the same product based on processing at least one product name included in the product information and at least one product name included in the collection information.

예시적으로, 상품 정보에 포함된 제1 제품이라는 상품명과 상기 제1 제품에 대응되는 제1 항목 정보를 포함하는 경우, 본 시스템(1000)은, 서버(200)로부터 수신한 웹페이지 데이터를 처리한 것에 기초하여, 제2 제품이라는 상품명과 상기 제2 제품에 대응되는 제2 항목 정보를 포함하는 수집 정보를 식별하고, 상기 제1 제품과 상기 제1 제품 간의 동일 상품 여부를 판단할 수 있다. 한편, 상품 정보 및 수집 정보는 각각 적어도 하나의 상품명과 그에 대응되는 항목 정보를 포함할 수 있으므로, 복수의 상품명과 그에 대응되는 항목 정보를 포함할 수 있다.For example, if the product information includes a product name called a first product and first item information corresponding to the first product, the system (1000) may, based on processing web page data received from the server (200), identify collection information including a product name called a second product and second item information corresponding to the second product, and determine whether the first product and the first product are the same product. Meanwhile, since the product information and the collection information may each include at least one product name and item information corresponding thereto, they may include a plurality of product names and item information corresponding thereto.

상술한 실시예의 경우, 본 시스템(1000)은, 상품 정보에 포함된 복수의 상품명과 웹페이지 데이터를 처리한 것을 기초로 식별한 수집 정보에 포함된 복수의 상품명을 각각 개별적으로 동일 상품 여부를 판단할 수 있다.In the case of the above-described embodiment, the system (1000) can individually determine whether multiple product names included in the collected information are the same product based on processing multiple product names included in the product information and web page data.

이에 따라, 본 시스템(1000)은, 상품 정보 및 수집 정보가 동일 상품으로 판단된 것에 응답하여, 상품 정보를 업데이트할 수 있다. 보다 상세하게, 상품 정보에 포함된 상품명과 수집 정보에 포함된 상품명이 동일한 상품으로 판단된 것에 응답하여, 상기 상품명에 대응되는 상품 정보에 상기 수집 정보를 기초로 (시계열적으로) 업데이트하여 저장할 수 있다.Accordingly, the system (1000) can update the product information in response to the product information and the collection information being determined to be the same product. More specifically, in response to the product name included in the product information and the product name included in the collection information being determined to be the same product, the product information corresponding to the product name can be updated (in time series) and stored based on the collection information.

한편, 본 시스템(1000)이 동일 상품 여부를 판단하는 것은, 예를 들어, 동일한 상품인 경우에도,공급처(또는 공급자)에 따라 상이한 상품명(추가적인 미사여구를 사용하는 등)을 사용하는 경우를 방지하여 비정형 데이터가 생성되는 것을 방지하는 것일 수 있다. Meanwhile, the system (1000) may determine whether or not it is the same product, for example, to prevent the use of different product names (such as using additional descriptive words) depending on the supplier (or vendor) even in the case of the same product, thereby preventing the creation of non-standard data.

본 개시의 일 실시예에 의한 본 시스템(1000)은, 획득된 데이터를 처리한 것에 기초하여, 적어도 하나의 상품명과 상기 상품명에 대응되는 적어도 하나의 리뷰 정보를 포함하는 수집 정보를 식별할 수 있다. 보다 상세하게, 본 시스템(1000)은, 서버(200)로부터 수신된 웹페이지 데이터를 처리한 것에 기초하여, 상기 웹페이지 데이터에 포함된, 적어도 하나의 상품명과 상기 상품명에 대응되는 제2 항목 정보를 포함하는 수집 정보를 식별할 수 있다. 즉, 항목 정보는 각각의 상품명에 대응되는 메타 데이터의 형식으로 이루어진 데이터로서, 제2 항목 정보는 예를 들어, 상품명에 대응되는 상품의 사용자가 작성한 리뷰에 관한 정보를 포함하는 리뷰 정보를 포함할 수 있다.According to one embodiment of the present disclosure, the system (1000) can identify collection information including at least one product name and at least one review information corresponding to the product name based on processing acquired data. More specifically, the system (1000) can identify collection information including at least one product name and second item information corresponding to the product name, included in the webpage data, based on processing webpage data received from the server (200). That is, the item information is data in the form of metadata corresponding to each product name, and the second item information can include, for example, review information including information regarding a review written by a user of a product corresponding to the product name.

따라서, 시스템(1000)이 획득하는 수집 정보에 포함된 제2 항목 정보는, 리뷰 정보를 더 포함할 수 있다. 보다 상세하게, 리뷰 정보는 예를 들어, 특정 상품명에 대응된 복수의 작성자 ID 및 상기 ID에 대응되는 고객 리뷰 텍스트, 제품 사진, 평점, 찜수, 좋아요수, 게시일, 카테고리, 판매 옵션, 기준 평점 등을 포함할 수 있다. 한편, 여기에서, 기준 평점은, 본 시스템(1000)이 기준 평점을 제외한 리뷰 정보를 평점 추출 모델에 입력하여 출력된 평점으로서, 상기 출력된 평점을 기준 평점으로 상기 리뷰 정보에 업데이트된 것일 수 있다.Accordingly, the second item information included in the collection information acquired by the system (1000) may further include review information. More specifically, the review information may include, for example, a plurality of author IDs corresponding to a specific product name and customer review text, product photos, ratings, number of likes, number of posts, posting dates, categories, sales options, and reference ratings corresponding to the IDs. Meanwhile, the reference rating here may be a rating output by the system (1000) inputting review information excluding the reference rating into a rating extraction model, and the output rating may be updated in the review information as the reference rating.

보다 상세하게, 본 시스템(1000)은, 수집 정보를 처리한 것에 기초하여, 리뷰 정보의 적어도 하나의 대표 문장을 식별하고, 상기 대표 문장을 이하에서 자세히 서술할 평점 추출 모델에 입력하여, 상기 리뷰 정보의 평점(기준 평점)을 추출할 수 있다. More specifically, the system (1000) can identify at least one representative sentence of the review information based on the processing of the collected information, and input the representative sentence into a rating extraction model described in detail below to extract a rating (reference rating) of the review information.

한편, 본 개시의 일 실시예에 의한 시스템(1000)은, 기저장된 상품 정보 및 브랜드 정보를 처리한 것에 기초하여, 상품 정보를 이하에서 자세히 서술할 판매량 예측 모델에 입력하여, 예측 판매량을 출력할 수 있다. 이에 본 시스템(1000)은 출력(추출)된 예측 판매량 및 브랜드 정보를 이하에서 자세히 서술할 브랜드 가치 평가 모델에 입력하여, 브랜드의 적어도 하나의 가치 지표를 출력(추출)할 수 있다.Meanwhile, the system (1000) according to one embodiment of the present disclosure can input product information into a sales volume prediction model, which will be described in detail below, based on processing pre-stored product information and brand information, and output a predicted sales volume. Accordingly, the system (1000) can input the output (extracted) predicted sales volume and brand information into a brand value evaluation model, which will be described in detail below, and output (extract) at least one value indicator of the brand.

본 개시의 일 실시예에 의한 본 시스템(1000)의 본 장치(100), 서버(200), 사용자 단말(300)은 네트워크(1)로 연결될 수 있다. 여기서 네트워크(1)로 연결될 수 있다는 의미는 전기적으로, 또는 통신적으로 연결될 수 있는 것으로 이해할 수 있다.The main device (100), the server (200), and the user terminal (300) of the system (1000) according to one embodiment of the present disclosure can be connected to a network (1). Here, being connected to a network (1) can be understood to mean being electrically or communicatively connected.

네트워크(1)의 일 예에는 3GPP(3rd Generation Partnership Project) 네트워크(3G 네트워크, 4G 또는 LTE(Long Term Evolution) 네트워크, 5G 또는 NR(New Radio) 네트워크, 또는 6G 네트워크 등), WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Examples of networks (1) include, but are not limited to, 3rd Generation Partnership Project (3GPP) networks (such as 3G networks, 4G or LTE (Long Term Evolution) networks, 5G or NR (New Radio) networks, or 6G networks), WIMAX (World Interoperability for Microwave Access) networks, the Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), Bluetooth networks, satellite broadcasting networks, analog broadcasting networks, and DMB (Digital Multimedia Broadcasting) networks.

본 장치(100)는, 서버(200)로 데이터를 송신 및/또는 수신할 수 있다. 보다 상세하게, 본 장치(100)는 서버(200)로 데이터 송신을 요청하는 요청 메시지를 송신하고, 상기 서버(200)로부터 요청 메시지에 대응되는 데이터를 수신할 수 있다. 이에 본 장치(100)는 수신된 데이터를 처리한 것에 기초하여, 수집 정보를 식별할 수 있다. 보다 상세하게, 본 장치(100)는 수신된 웹페이지 데이터를 처리한 것에 기초하여, 적어도 하나의 상품명 및 상기 상품명 대응되는 항목 정보인 제2 항목 정보를 포함하는 수집 정보를 식별할 수 있다.The present device (100) can transmit and/or receive data to the server (200). More specifically, the present device (100) can transmit a request message requesting data transmission to the server (200) and receive data corresponding to the request message from the server (200). Accordingly, the present device (100) can identify collection information based on processing the received data. More specifically, the present device (100) can identify collection information including at least one product name and second item information corresponding to the product name based on processing the received webpage data.

한편, 본 장치(100)가 서버(200)로 요청하는 데이터는 예를 들어, 웹 페이지 데이터 및/또는 이미지 데이터 및/또는 텍스트 데이터 등을 포함할 수 있다. 다른 실시예로, 데이터는 예를 들어, 특정 키워드에 대응되는 검색 결과를 포함하는 메타 데이터일 수 있다. 다만, 이에 한정되는 것은 아니다.Meanwhile, the data that the device (100) requests from the server (200) may include, for example, web page data and/or image data and/or text data. In another embodiment, the data may be, for example, metadata including search results corresponding to a specific keyword. However, the present invention is not limited thereto.

한편, 본 장치(100)가 서버(200)로부터 획득하는 웹페이지 데이터를 포함하는 요청 데이터는, 예를 들어, 복수의 스레드를 사용하여 여러 웹페이지에 관한 웹페이지 데이터를 수집하는 멀티 스레딩 방식, 적어도 두 개의 프로세스를 이용하여, 각각의 프로세서가 별도의 웹 페이지에 관한 웹페이지 데이터를 수집하는 멀티 프로세싱 방식 또는 복수의 본 장치(100)를 구비하여, 각각 독립적으로 웹페이지에 관한 웹페이지 데이터를 수집하는 분산 크롤링 방식 등을 이용하여 획득할 수 있다. 다만, 이에 한정되는 것은 아니다. 즉, 본 장치(100)가 서버(200)로부터 획득하는 웹페이지 데이터는 기공지된 웹크롤링 방식 및/또는 향후 개발될 웹크롤링 방식이 이용될 수 있음을 이해할 수 있다.Meanwhile, the request data including web page data that the device (100) acquires from the server (200) may be acquired using, for example, a multi-threading method that uses multiple threads to collect web page data about multiple web pages, a multi-processing method that uses at least two processes to collect web page data about separate web pages, or a distributed crawling method that has multiple devices (100) and independently collects web page data about web pages. However, the present invention is not limited thereto. That is, it can be understood that the web page data that the device (100) acquires from the server (200) may use a known web crawling method and/or a web crawling method to be developed in the future.

본 개시의 일 실시예에 의한 서버(200)는, 각종 데이터를 처리하고 저장하며, 처리한 데이터를 송신 및/또는 수신할 수 있다. 보다 구체적으로, 서버(200)는 네트워크(1)로 본 장치(100), 사용자 단말(300)과 상호간 연결될 수 있다. 이에 따라서, 서버(2000)는 상호간 데이터를 연동하기 위한 데이터를 상기 네트워크(1)를 기초로 송신 및/또는 수신할 수 있다. 서버(200)는 예시적으로, 플랫폼(Platform) 및/또는 웹페이지 클라우드 서버 등 일 수 있으나, 이에 한정되는 것은 아니다. The server (200) according to one embodiment of the present disclosure can process and store various types of data, and transmit and/or receive the processed data. More specifically, the server (200) can be mutually connected to the present device (100) and the user terminal (300) via the network (1). Accordingly, the server (2000) can transmit and/or receive data for linking data between each other based on the network (1). The server (200) can be, for example, a platform and/or a web page cloud server, but is not limited thereto.

서버(200)는 예를 들어, 데이터베이스와 일체로 구성될 수 있으며 본 장치(100)의 요청에 따라 요청에 대응되는 데이터를 검색하고, 검색된 정보에 기초하여 요청된 특정 정보를 생성하고, 생성된 특정 정보를 본 장치(100) 에 다시 송신할 수 있다.The server (200) may be configured as an integral part of a database, for example, and may search for data corresponding to a request according to a request of the present device (100), generate requested specific information based on the searched information, and transmit the generated specific information back to the present device (100).

서버(200)는 단일 서버로 존재할 수 있으며, 복수의 서버로 구성될 수 있다. 또한, 서버(200)는 각종 데이터를 처리하는 적어도 하나의 프로세서를 포함할 수 있다. The server (200) may exist as a single server or may be composed of multiple servers. In addition, the server (200) may include at least one processor that processes various types of data.

본 개시의 일 실시예에 의한 사용자 단말(300)은, 예를 들어, PCS(Personal Communication System), GSM(Global System for Mobile communication), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(WCode Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트패드(SmartPad), 태블릿 PC, 노트북, 웨어러블 디바이스, 디지털 사이니지(Digital Signage) 등과 같은 모든 종류의 입출력 가능한 유무선 통신 장치를 포함할 수 있으며, 이에 한정되는 것은 아니다.A user terminal (300) according to one embodiment of the present disclosure may include, but is not limited to, all types of input/output capable wired/wireless communication devices, such as, for example, a PCS (Personal Communication System), a GSM (Global System for Mobile communication), a PDC (Personal Digital Cellular), a PHS (Personal Handyphone System), a PDA (Personal Digital Assistant), an IMT (International Mobile Telecommunication)-2000, a CDMA (Code Division Multiple Access)-2000, a W-CDMA (WCode Division Multiple Access), a Wibro (Wireless Broadband Internet) terminal, a Smartphone, a SmartPad, a tablet PC, a laptop, a wearable device, a digital signage, etc.

이상에서는, 본 시스템(1000)을 구성하는, 본 장치(100), 서버(200) 및 사용자 단말(300)에 대하여 간략하게 설명하였다. 이하에서는, 도면을 참조하여, 본 시스템(1000)의 구성에 대하여 자세히 설명하도록 한다. 한편, 이하에서 서술하는 본 장치(100)가 수행하는 실시예들은 본 장치(100)를 포함하는 본 시스템(1000)이 수행하는 실시예로도 이해될 수 있다.In the above, the device (100), the server (200), and the user terminal (300) that constitute the system (1000) have been briefly described. Hereinafter, the configuration of the system (1000) will be described in detail with reference to the drawings. Meanwhile, the embodiments performed by the device (100) described below may also be understood as embodiments performed by the system (1000) including the device (100).

도 2는 일 실시예에 의한 전자 장치의 구성을 나타낸 블록도이다.Figure 2 is a block diagram showing the configuration of an electronic device according to one embodiment.

도 2를 참조하면, 본 개시의 일 실시예에 의한 본 장치(100)는, 통신부(110)와 프로세서(121) 및 메모리(122)를 포함하는 제어부(120)를 포함할 수 있다.Referring to FIG. 2, the device (100) according to one embodiment of the present disclosure may include a communication unit (110), a control unit (120) including a processor (121) and a memory (122).

본 개시의 일 실시예에 따른 통신부(110)는 네트워크(1)를 통해 서버(200)와 통신을 수행할 수 있으며, 서버(200)로 웹페이지 데이터를 포함하는 데이터를 요청할 수 있으며, 서버(200)로부터 요청한 데이터를 수신할 수 있다. 또한, 본 개시의 일 실시예에 의한 통신부(110)는 기 공지된 유형의 무선 통신 모듈로 구성될 수 있다.A communication unit (110) according to one embodiment of the present disclosure can communicate with a server (200) via a network (1), request data including web page data from the server (200), and receive the requested data from the server (200). In addition, the communication unit (110) according to one embodiment of the present disclosure can be configured with a wireless communication module of a known type.

제어부(120)는 프로세서(121) 및 메모리(122)를 포함할 수 있다.The control unit (120) may include a processor (121) and a memory (122).

프로세서(121)는 통신부(110)로부터 획득한 웹페이지 데이터를 포함하는 데이터 및 메모리(122)에 기저장된 상품 정보를 처리할 수 있다. 예를 들어, 프로세서(121)는 디지털 시그널 프로세서(DSP, Digital Signal Processor) 및/또는 마이크로 컨트롤 유닛(MCU, Micro Control Unit)을 포함할 수 있다.The processor (121) can process data including web page data obtained from the communication unit (110) and product information stored in the memory (122). For example, the processor (121) can include a digital signal processor (DSP) and/or a micro control unit (MCU).

도 3은 일 실시예에 의한 전자 장치가 기저장한 상품 정보를 설명하기 위한 도면이다. 도 4는 일 실시예에 의한 전자 장치가 저장하는 상품 정보의 시계열적 데이터를 설명하기 위한 개념도이다.FIG. 3 is a diagram for explaining product information stored in an electronic device according to one embodiment. FIG. 4 is a conceptual diagram for explaining time-series data of product information stored in an electronic device according to one embodiment.

도 3을 참조하면, 본 개시의 일 실시예에 의한 프로세서(121)는 메모리(122)에 기저장된 상품 정보(21)를 처리할 수 있다. 보다 구체적으로 상품 정보(21)는 적어도 하나의 상품명과 상기 상품명에 대응되는 항목 정보(31)를 포함할 수 있다. 항목 정보(31)는 예를 들어, 상품명에 대응되는 메타 데이터의 형식으로, 상기 상품명에 대한 상품의 세부 정보를 의미할 수 있다.Referring to FIG. 3, a processor (121) according to one embodiment of the present disclosure can process product information (21) pre-stored in a memory (122). More specifically, the product information (21) can include at least one product name and item information (31) corresponding to the product name. The item information (31) can mean detailed information on a product for the product name, for example, in the form of metadata corresponding to the product name.

예시적으로, 상품 정보(21)는 상품명과 상기 상품명에 대응되는 카테고리, 순위, 판매량, 가격(41), 링크, 평점, 배송비, 브랜드명, 제조사, 누적 리뷰수, 획득 시점(웹페이지 데이터 획득 시점), 리뷰 정보(51) 등을 포함하는 항목 정보(31)를 포함할 수 있다.For example, product information (21) may include item information (31) including a product name and a category corresponding to the product name, a ranking, a sales volume, a price (41), a link, a rating, a shipping fee, a brand name, a manufacturer, a cumulative number of reviews, an acquisition time (a time of acquiring web page data), review information (51), etc.

상품 정보(21)는, 프로세서(121)가 서버(200)로부터 획득한 웹페이지 데이터를 처리한 것에 기초하여, 식별한 적어도 하나의 상품명과 상기 상품명에 대응되는 제2 항목 정보를 포함하는 수집 정보의 누적된 메타데이터 형식의 빅데이터일 수 있다. 즉, 상품 정보(21)는, 프로세서(121)가 수집 정보를 획득한 시점에 따라 시계열적으로 누적된 데이터일 수 있다.Product information (21) may be big data in the form of accumulated metadata of collected information including at least one product name and second item information corresponding to the product name, which is identified based on the processor (121) processing web page data acquired from the server (200). That is, product information (21) may be data accumulated in time series according to the time at which the processor (121) acquired the collected information.

도 4를 참조하면, 도 4는, 상품 정보에 포함된 적어도 하나의 상품명에 대한 제1 항목 정보의 시계열적 데이터를 설명하기 위한 도면이다. 본 개시의 일 실시예에 따른 메모리(122)에 기저장된 상품 정보 중 가격(41)에 관한 정보는, 프로세서(121)가 서버(200)로부터 웹페이지 데이터를 획득한 시점에 따라 누적된 정보일 수 있다. 다만, 이에 한정되는 것은 아니고 다른 실시예로, 항목 정보(31)에 포함된 순위, 판매량, 평점, 누적 리뷰수에 관한 정보는 상술한 가격(41)에 관한 정보와 같이 획득한 시점에 따라 누적된 정보일 수 있다. 즉, 도 4에 한정되는 것은 아니다.Referring to FIG. 4, FIG. 4 is a diagram for explaining time-series data of first item information for at least one product name included in product information. Information on price (41) among product information pre-stored in memory (122) according to one embodiment of the present disclosure may be information accumulated according to the time at which the processor (121) acquires webpage data from the server (200). However, it is not limited thereto, and in another embodiment, information on ranking, sales volume, rating, and accumulated number of reviews included in item information (31) may be information accumulated according to the time at which they were acquired, like the information on price (41) described above. That is, it is not limited to FIG. 4.

도 5는 일 실시예에 의한 전자 장치가 수집하는 수집 정보를 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining collection information collected by an electronic device according to one embodiment.

도 5를 참조하면, 본 개시의 일 실시예에 따른 프로세서(121)는 서버(200)로부터 통신부(110)를 통해 웹페이지 데이터(61)를 수신한 것에 응답하여, 상기 웹페이지 데이터(61)를 처리할 수 있다. 이에 따라, 프로세서(121)는 웹페이지 데이터(61)를 처리한 것에 기초하여, 적어도 하나의 상품명 및 상기 상품명에 대응되는 제2 항목 정보를 포함하는 수집 정보를 식별할 수 있다.Referring to FIG. 5, a processor (121) according to one embodiment of the present disclosure may process webpage data (61) in response to receiving the webpage data (61) from a server (200) through a communication unit (110). Accordingly, the processor (121) may identify collection information including at least one product name and second item information corresponding to the product name based on the processed webpage data (61).

한편, 수집 정보는 예를 들어, 적어도 하나의 서버(200)로부터 수신된 웹페이지 데이터를 처리한 것에 기초하여 획득한 정보로서, 서로 다른 서버(200)로부터 획득한 웹페이지 데이터 및/또는 동일한 서버(200)로부터 획득한 웹페이지 데이터를 수신한 경우, 동일한 상품에 대하여 다른 상품명과 상기 상품명 각각에 대응되는 항목 정보를 포함할 수 있다.Meanwhile, the collected information is information obtained based on processing web page data received from, for example, at least one server (200), and when web page data obtained from different servers (200) and/or web page data obtained from the same server (200) are received, it may include different product names and item information corresponding to each of the product names for the same product.

예시적으로, 도 5의 (a)를 참조하면, 본 개시의 일 실시예에 따른 프로세서(121)는 웹페에지 데이터(61)를 처리한 것에 기초하여, '락토핏 골드 2gX50'라는 제1 상품명을 식별할 수 있다. 또한, 도 5의 (b)를 참조하면, 프로세서(121)는 웹페이지 데이터(61)를 처리한 것에 기초하여, '락토핏골드프로바이오틱스 2gX50'라는 제2 상품명을 식별할 수 있다. 이 경우, 제1 상품명과 제2 상품명은 서로 상이한 상품명을 가지나, 서로 동일한 상품이다. 즉, 웹페이지 데이터(61) 내에는 동일한 상품에 대한 다른 상품명을 갖는 수집 정보를 포함할 수 있다.For example, referring to (a) of FIG. 5, the processor (121) according to one embodiment of the present disclosure can identify a first product name, 'Lactofit Gold 2gX50', based on processing web page data (61). In addition, referring to (b) of FIG. 5, the processor (121) can identify a second product name, 'Lactofit Gold Probiotics 2gX50', based on processing web page data (61). In this case, the first product name and the second product name have different product names, but are the same product. That is, the web page data (61) may include collection information having different product names for the same product.

이에 따라서, 프로세서(121)는, 기저장된 상품 정보 및 식별된 수집 정보를 처리한 것에 기초하여, 상품 정보 및 및 수집 정보 간의 동일 상품 여부를 판단할 수 있다. 보다 구체적으로, 프로세서(121)는 상품 정보에 포함된 적어도 하나의 상품명 및 수집 정보에 포함된 적어도 상품명 간의 유사도를 식별하고, 상기 유사도를 기초로 상품 정보에 포함된 상품명과 수집 정보에 포함된 상품명 간의 동일 상품 여부를 판단할 수 있다. 즉, 도 5에 도시된 바에 따르면, 프로세서(121)는, 상품 정보에 포함된 적어도 하나의 상품명과 식별된 수집 정보에 포함된 상품명(예를 들어, 제1 상품명 또는 제2 상품명) 간의 유사도를 식별할 수 있다.Accordingly, the processor (121) can determine whether the product information and the collection information are the same product based on the processing of the stored product information and the identified collection information. More specifically, the processor (121) can identify the similarity between at least one product name included in the product information and at least one product name included in the collection information, and determine whether the product name included in the product information and the product name included in the collection information are the same product based on the similarity. That is, as illustrated in FIG. 5, the processor (121) can identify the similarity between at least one product name included in the product information and a product name included in the identified collection information (for example, the first product name or the second product name).

보다 상세하게, 프로세서(121)는, 상품 정보 또는 수집 정보에 포함된 상품명의 문장 임베딩(Sentence embedding)을 계산할 수 있다.In more detail, the processor (121) can calculate a sentence embedding of a product name included in product information or collection information.

문장 임베딩은, 자연어 처리(Natural Language Processing NLP) 기술 분야에서 텍스트를 숫자 벡터로 변환하는 것을 의미한다. 예시적으로, 프로세서(121)는 문장에 포함된 모든 단어 벡터의 평균을 계산하여 문장 벡터를 생성하는 평균화(Average Pooling) 방식, 문장에 포함된 모든 단어 벡터 중 가장 큰 값을 가지는 벡터를 문장 벡터로 생성하는 최대 풀링(Max Pooling) 방식, 순환 신경망(RNN) 또는 장단기 메모리(LSTM)과 같은 순환형 신경망 아키텍처를 사용하여 문장 내 단어 순서를 고려하는 방식, 트랜스포머 기반의 모델인 BERT, GPT 등의 알고리즘을 활용한 방식 등 중 적어도 하나를 이용하여 문장 임베딩을 계산할 수 있다. 다만, 이에 한정되는 것은 아니고, 기공지된 문장 임베딩 계산 방식 또는 향후 개발될 문장 임베딩 계산 방식이 적용될 수 있다.Sentence embedding refers to converting text into a numeric vector in the field of natural language processing (NLP) technology. For example, the processor (121) may calculate sentence embedding by using at least one of the following: an average pooling method that generates a sentence vector by calculating the average of all word vectors included in a sentence; a max pooling method that generates a sentence vector by using a vector having the largest value among all word vectors included in a sentence; a method that considers the word order in a sentence by using a recurrent neural network architecture such as a recurrent neural network (RNN) or a long short-term memory (LSTM); and a method that utilizes an algorithm such as BERT or GPT, which is a transformer-based model. However, the present invention is not limited thereto, and a publicly known sentence embedding calculation method or a sentence embedding calculation method to be developed in the future may be applied.

본 개시의 일 실시예에 의한 프로세서(121)는 상품 정보를 처리한 것에 기초하여, 상기 상품 정보에 포함된 적어도 하나의 상품명의 문장 임베딩(Sentence embedding)을 계산할 수 있다. 즉, 프로세서(121)는 기저장된 복수의 상품명에 대한 문장 임베딩(Sentence embedding)을 각각 계산하고 메모리(122)에 저장할 수 있다.According to one embodiment of the present disclosure, the processor (121) can calculate a sentence embedding of at least one product name included in the product information based on the product information processed. That is, the processor (121) can calculate sentence embeddings for each of a plurality of pre-stored product names and store them in the memory (122).

또한, 프로세서(121)는 수집 정보를 처리한 것에 기초하여, 상기 수집 정보에 포함된 적어도 하나의 상품명의 문장 임베딩을 계산할 수 있다.Additionally, the processor (121) can calculate a sentence embedding of at least one product name included in the collected information based on the processing of the collected information.

이에 따라서, 프로세서(121)는 상품 정보에 포함된 적어도 하나의 상품명에 대한 계산된 문장 임베딩과 수집 정보에 포함된 적어도 하나의 상품명에 대해 계산된 문장 임베딩 간의 코사인 유사도(Cosine Similarity)를 계산할 수 있다.Accordingly, the processor (121) can calculate cosine similarity between the calculated sentence embedding for at least one product name included in the product information and the calculated sentence embedding for at least one product name included in the collection information.

코사인 유사도(cosine similarity)는, 두 벡터 사이의 각도를 기반으로, 벡터 간의 유사성을 측정하는 방식 중 하나로, 0에 가까울수록 벡터 간의 각이 크며, 1에 가까울수록 벡터 간의 각이 작다는 의미다. 즉, 프로세서(121)는, 상품 정보에 포함된 적어도 하나의 상품명과 수집 정보에 포함된 적어도 하나의 상품명 각각의 문장 임베딩 간의 코사인 유사도를 계산하여, 두 상품명 간의 유사도를 식별할 수 있다. Cosine similarity is one of the methods of measuring the similarity between vectors based on the angle between the two vectors. The closer it is to 0, the larger the angle between the vectors, and the closer it is to 1, the smaller the angle between the vectors. That is, the processor (121) can identify the similarity between two product names by calculating the cosine similarity between the sentence embeddings of at least one product name included in the product information and at least one product name included in the collection information.

프로세서(121)는, 식별된 유사도(예를 들어, 코사인 유사도 값)가 미리 정해진 값보다 큰 것에 응답하여, 상품 정보에 포함된 상품명과 수집 정보에 포함된 상품명이 동일한 상품인 것으로 판단할 수 있다. 이와 반대로, 프로세서(121)는, 식별된 유사도가 미리 정해진 값보다 작은 것에 응답하여, 상품 정보에 포함된 상품명과 수집 정보에 포함된 상품명이 다른 상품인 것으로 판단할 수 있다.The processor (121) may determine that the product name included in the product information and the product name included in the collection information are the same product in response to the identified similarity (e.g., cosine similarity value) being greater than a predetermined value. Conversely, the processor (121) may determine that the product name included in the product information and the product name included in the collection information are different products in response to the identified similarity being less than a predetermined value.

한편, 상기 미리 정해진 값은, 실험적 또는 경험적으로 산출된 값으로서, 상품 또는 상품의 카테고리에 따라서 다른 미리 정해진 값을 가질 수 있다. 이에 따라서, 상기 미리 정해진 값은 유사한 상품명을 가진 동일 상품에 대하여 동일한 상품으로 판단하도록 설정될 수 있다.Meanwhile, the above predetermined value is a value obtained experimentally or empirically, and may have different predetermined values depending on the product or product category. Accordingly, the above predetermined value may be set to determine that the same product with a similar product name is the same product.

이에 따라, 프로세서(121)는, 상품 정보에 포함된 적어도 하나의 상품명과 수집 정보에 포함된 적어도 하나의 상품명에 대한 문장 임베딩을 계산하고, 상품 정보에 포함된 상품명의 문장 임베딩과 수집 정보에 포함된 상품명의 문자 임베딩 간의 코사인 유사도를 계산한 것에 응답하여, 동일 상품 여부를 판단할 수 있다.Accordingly, the processor (121) calculates sentence embeddings for at least one product name included in the product information and at least one product name included in the collection information, and calculates cosine similarity between the sentence embeddings of the product name included in the product information and the character embeddings of the product name included in the collection information, thereby determining whether the products are the same.

본 개시의 일 실시예에 의한 프로세서(121)는, 상품 정보에 포함된 적어도 하나의 상품명과 수집 정보에 포함된 적어도 하나의 상품명 각각의 문장 임베딩 간의 코사인 유사도가 미리 정해진 값보다 큰 것으로 판단된 것에 응답하여, 수집 정보에 포함된 상기 상품명에 대응되는 제2 항목 정보를 기초로 상기 상품 정보에 포함된 상기 상품명에 대응되는 제1 항목 정보를 업데이트할 수 있다.According to one embodiment of the present disclosure, the processor (121) may, in response to determining that the cosine similarity between the sentence embeddings of at least one product name included in the product information and at least one product name included in the collection information is greater than a predetermined value, update the first item information corresponding to the product name included in the product information based on the second item information corresponding to the product name included in the collection information.

예시적으로, 프로세서(121)는, 상품 정보에 포함된 제1 상품에 대응되는 제1 항목 정보인 가격에 대한 정보를 수집 정보에 포함된 제1 상품에 대응되는 제2 항목 정보인 가격에 대한 정보로 업데이트할 수 있다. 즉, 프로세서(121)는, 상품이 동일하다고 판단된 상품명과 관련된 상품 정보에 포함된 제1 항목 정보의 가격에 관한 정보를 수집 정보에 포함된 제2 항목 정보의 가격에 관한 정보로 변경(업데이트)할 수 있다.For example, the processor (121) can update the price information, which is the first item information corresponding to the first product included in the product information, to the price information, which is the second item information corresponding to the first product included in the collection information. That is, the processor (121) can change (update) the price information of the first item information included in the product information related to the product name determined to be the same product to the price information of the second item information included in the collection information.

다른 실시예로, 프로세서(121)는, 상품 정보에 수집 정보에 포함된 제1 상품에 대응되는 제2 항목 정보인 가격에 대한 정보를 누적적으로 저장할 수 있다. 즉, 도 4에 도시된 바와 같이 프로세서(121)는, 수집 정보의 획득 시점에 따라, 상품이 동일하다고 판단된 상품명에 대한 상품 정보에 대하여 수집 정보를 누적적으로 저장할 수 있다. 다만, 이에 한정되는 것은 아니다.In another embodiment, the processor (121) may cumulatively store information on the price, which is the second item information corresponding to the first product included in the collection information in the product information. That is, as illustrated in FIG. 4, the processor (121) may cumulatively store collection information on product information for product names that are determined to be identical to the product, depending on the time of acquisition of the collection information. However, the present invention is not limited thereto.

도 6은 일 실시예에 의한 전자 장치가 수집하는 수집 정보를 설명하기 위한 도면이다. 또한, 도 7은 일 실시예에 의한 전자 장치가 수집하는 수집 정보의 결측값을 설명하기 위한 도면이다. 또한, 도 8은 일 실시예에 의한 전자 장치의 판매량 예측 모델을 설명하기 위한 도면이다.FIG. 6 is a diagram for explaining collection information collected by an electronic device according to one embodiment. In addition, FIG. 7 is a diagram for explaining missing values of collection information collected by an electronic device according to one embodiment. In addition, FIG. 8 is a diagram for explaining a sales volume prediction model of an electronic device according to one embodiment.

본 개시의 일 실시예에 의한 프로세서(121)는, 서버(200)로부터 수신한 웹페이지 데이터를 처리한 것에 기초하여, 수집 정보를 식별할 수 있다. 이에 따라, 프로세서(121)는 수집 정보를 처리한 것에 기초하여, 수집 정보에 포함된 제2 항목 정보 중 적어도 하나의 결측값을 식별할 수 있다. 즉, 웹페이지 데이터 내에는 적어도 하나의 상품명 및 상기 상품명에 대응되는 수집 정보가 포함되어 있을 수 있으나, 웹페이지 데이터 내에 특정 정보가 누락된 경우, 프로세서(121)는 식별된 수집 정보를 처리한 것에 기초하여, 수집 정보 내에 결측값(null)을 식별할 수 있다.The processor (121) according to one embodiment of the present disclosure can identify the collection information based on processing the web page data received from the server (200). Accordingly, the processor (121) can identify at least one missing value among the second item information included in the collection information based on the processing of the collection information. That is, the web page data may include at least one product name and collection information corresponding to the product name, but if specific information is missing in the web page data, the processor (121) can identify a missing value (null) in the collection information based on processing the identified collection information.

결측값 보정 프로세스는 예를 들어, 결측값이 존재하는 수집 정보를 기초로 인공지능 모델을 학습시키는 경우, 인공지능 모델의 신뢰도가 하락할 수 있으므로, 이를 방지하기 위한 프로세스일 수 있다. 따라서, 결측값 보정 프로세스는 결측값에 대한 신뢰도 높은 보정값을 산출하고, 상기 보정값을 기초로 결측값을 보정하여 부존재하는 데이터에 대한 보완(보충)할 수 있다.The missing value correction process may be a process to prevent the reliability of an artificial intelligence model from decreasing when training an artificial intelligence model based on collected information that contains missing values, for example. Accordingly, the missing value correction process can produce a highly reliable correction value for a missing value and compensate for the missing value based on the correction value to supplement (supplement) non-existent data.

한편, 프로세서(121)는, 수집 정보를 처리하여 수집 정보의 제2 항목 중 적어도 하나의 결측값이 식별된 것에 응답하여, 결측값 보정 프로세스를 수행하여 상기 결측값에 대응되는 제1 항목 정보를 처리한 것에 기초하여, 보정값을 식별할 수 있다. 보다 상세하게, 프로세서(121)는, 수집 정보를 처리한 것에 기초하여, 특정 상품명에 대한 제2 항목 정보 중 결측값을 식별한 것에 응답하여, 상기 특정 상품명에 대한 제1 항목 정보 중 상기 결측값에 대응되는 값을 처리한 것에 기초하여 보정값을 식별할 수 있다. 즉, 프로세서(121)는, 수집 정보에 포함된 제2 항목 정보 중 결측값이 식별된 항목에 대응되도록 상품이 동일하다고 판단된 상품 정보의 상품명에 관한 상품 정보의 제1 항목 정보를 처리한 것에 기초하여, 보정값을 식별할 수 있다.Meanwhile, the processor (121) may perform a missing value correction process in response to identifying at least one missing value among second items of the collected information by processing the collected information, and identify a correction value based on the processing of first item information corresponding to the missing value. More specifically, the processor (121) may identify a correction value based on the processing of a value corresponding to the missing value among first item information for the specific product name in response to identifying a missing value among second item information for the specific product name based on the processing of the collected information. That is, the processor (121) may identify a correction value based on the processing of first item information of product information regarding a product name of product information that is determined to be identical to a product so as to correspond to an item in which a missing value is identified among second item information included in the collected information.

일 실시예에 의한 프로세서(121)는, 수집 정보를 처리한 것에 기초하여, 식별된 결측값에 대응되는 상품 정보의 제1 항목 정보를 식별할 수 있다. 즉, 식별된 결측값에 대응되는 상품 정보의 제1 항목 정보는 수집 정보 중 특정 상품명에 대응되는 제2 항목 정보 중 결측값이 식별된 경우, 상기 결측값이 발생한 항목에 대하여 (동일한 상품이라고 판단된) 상기 특정 상품명에 대응되는 상품 정보의 제1 항목 정보의 항목의 값을 식별하는 것일 수 있다.The processor (121) according to one embodiment may identify the first item information of product information corresponding to the identified missing value based on the processing of the collected information. That is, the first item information of product information corresponding to the identified missing value may identify the value of the first item information of product information corresponding to the specific product name (determined to be the same product) for the item in which the missing value occurs when a missing value is identified among the second item information corresponding to a specific product name in the collected information.

이 경우, 프로세서(121)는, 제1 항목 정보 중, 수집 정보의 결측값에 대응되는 가장 최근 항목의 값을 보정값으로 식별할 수 있다. 예시적으로, 프로세서(121)는, 수집 정보를 처리한 것에 기초하여, 평점(항모 정보 중 하나)에 대한 결측값을 식별한 경우에 응답하여, 가장 최근의 획득 시점에 대응되는 제1 항목 정보 중 평점에 관한 값을 보정값으로 식별하고, 상기 수집 정보의 결측값을 상기 보정값으로 보정할 수 있다. 이에 따라서, 수집 정보의 결측값은, 상기 결측값에 대응되는 항목의 최근 제1 항목 정보를 보정값으로 보정될 수 있다.In this case, the processor (121) may identify the value of the most recent item corresponding to the missing value of the collected information among the first item information as the correction value. For example, when the processor (121) identifies a missing value for the rating (one of the aircraft carrier information) based on processing the collected information, in response, the processor may identify the value of the rating among the first item information corresponding to the most recent acquisition time as the correction value, and correct the missing value of the collected information with the correction value. Accordingly, the missing value of the collected information may be corrected with the most recent first item information of the item corresponding to the missing value as the correction value.

다른 실시예로, 프로세서(121)는, 결측값에 대응되는 제1 항목 정보 중 미리 정해진 기간의 평균 값을 보정 값으로 식별할 수 있다. 보다 상세하게, 프로세서(121)는, 수집 정보의 제2 항목 정보 중 배송비에 관한 결측값을 식별한 것에 응답하여, 획득 시점에 따라 누적된 상품 정보의 제1 항목 정보 중 미리 정해진 기간 동안의 배송비에 관한 값들의 평균 값을 보정값으로 식별할 수 있다. 이에 따라, 프로세서(121)는, 누적된 상품 정보 중 결측값에 대응되는 항목에 대한 미리 정해진 기간의 평균 값을 보정값으로 상기 수집 정보의 결측값을 보정할 수 있다.In another embodiment, the processor (121) may identify an average value of a predetermined period of first item information corresponding to a missing value as a correction value. More specifically, the processor (121) may identify an average value of values of a predetermined period of shipping cost among the second item information of the collected information as a correction value in response to identifying a missing value regarding a shipping cost among the second item information of the collected information. Accordingly, the processor (121) may correct the missing value of the collected information by using an average value of a predetermined period of shipping cost for an item corresponding to a missing value among the accumulated product information as a correction value.

한편, 일 실시예에 따른 프로세서(121)는, 수집 정보의 결측값에 대응되는 제1 항목 정보의 미리 정해진 기간 동안의 값이 없는 경우에 응답하여, 상품 정보를 처리한 것에 기초하여, 상품명에 포함된 카테고리의 상기 결측값에 대응되는 제1 항목 정보의 미리 정해진 기간의 평균값을 보정값으로 식별하여 상기 결측값을 보정할 수 있다.Meanwhile, the processor (121) according to one embodiment may, in response to a case where there is no value for a predetermined period of first item information corresponding to a missing value of the collected information, identify an average value for a predetermined period of first item information corresponding to the missing value of a category included in the product name as a correction value and correct the missing value based on the processed product information.

보다 상세하게, 예시적으로 최근에 출시된 상품의 경우, 기저장되어 수집 정보가 누적된 상품 정보가 미비한 경우가 발생할 수 있다. 이에 따른 결측값이 발생하는 것을 방지하기 위하여, 일 실시예에 의한 프로세서(121)는, 상품 정보를 처리한 것에 기초하여 상품명이 속해 있는 카테고리의 식별된 수집 정보의 결측값에 대응되는 제1 항목 정보의 미리 정해진 기간의 평균값을 보정값으로 식별할 수 있다.More specifically, for example, in the case of a recently launched product, there may be cases where the product information for which pre-stored and accumulated collection information is insufficient. In order to prevent the occurrence of missing values due to this, the processor (121) according to one embodiment may identify the average value of the first item information for a predetermined period corresponding to the missing value of the identified collection information of the category to which the product name belongs based on the processed product information as a correction value.

보다 상세하게, 상품 정보는 적어도 하나의 카테고리에 따른 그룹 및 상기 그룹에 대응되는 복수의 상품명과 상기 상품명 각각에 대응되는 제1 항목 정보를 포함할 수 있다. 예를 들어, 상품 정보는 견과류 카테고리에 포함된 복수의 상품명은 예를 들어, 하루 견과, 닥터넛츠, 투데이넛 등을 포함할 수 있다.More specifically, the product information may include a group according to at least one category, a plurality of product names corresponding to the group, and first item information corresponding to each of the product names. For example, the product information may include a plurality of product names included in the nuts category, such as Daily Nuts, Dr. Nuts, and Today Nuts.

즉, 본 개시의 일 실시예에 따른 프로세서(121)는, 수집 정보의 특정 상품명에 대한 결측값을 식별한 것에 응답하여, 상기 결측값에 대응되는 상품 정보의 제1 항목 정보를 기초로 보정값을 식별하되, 상기 결측값에 대응되는 상품 정보의 제1 항목 정보가 부존재하는 것에 응답하여, 상기 특정 상품명이 속해 있는 카테고리의 그룹의 상기 결측값에 대응되는 제1 항목 정보를 기초로 보정값을 식별할 수 있다.That is, according to one embodiment of the present disclosure, the processor (121), in response to identifying a missing value for a specific product name of the collected information, identifies a correction value based on first item information of product information corresponding to the missing value, and in response to the absence of first item information of product information corresponding to the missing value, identifies the correction value based on first item information corresponding to the missing value of a group of categories to which the specific product name belongs.

예시적으로, 수집 정보 중 제1 상품명에 대한 가격이 결측값으로 식별된 것에 응답하여, 프로세서(121)는, 상품 정보를 처리한 것에 기초하여, 상기 제1 상품명이 속해있는 그룹의 제2 상품, 제3 상품 또는 제4 상품 등 중 적어도 하나의 미리 정해진 기간 동안의 가격에 대한 평균값을 보정값으로 식별하고, 상기 결측값을 상기 보정값으로 보정할 수 있다.For example, in response to the price of the first product name being identified as a missing value among the collected information, the processor (121) may identify, based on the processed product information, an average value of the price of at least one of the second product, the third product, or the fourth product of the group to which the first product name belongs during a predetermined period as a correction value, and correct the missing value with the correction value.

다른 실시예로, 수집 정보 중 제1 상품명에 대한 가격이 결측값으로 식별된 것에 응답하여, 프로세서(121)는, 상품 정보를 처리한 것에 기초하여, 상기 제1 상품명이 속해있는 그룹의 제2 상품명, 제3 상품명 또는 제4 상품명 등 적어도 하나의 가장 최근의 획득 시점에 대응되는 제1 항목 정보 중 가격에 관한 값을 보정값으로 식별하고, 상기 수집 정보의 결측값을 상기 보정값으로 보정할 수 있다In another embodiment, in response to the fact that the price of the first product name among the collected information is identified as a missing value, the processor (121) may identify, based on the processed product information, a value related to the price among the first item information corresponding to at least one of the most recent acquisition points, such as the second product name, the third product name, or the fourth product name of the group to which the first product name belongs, as a correction value, and correct the missing value of the collected information with the correction value.

도 6 및 도 7을 참조하면, 본 개시의 일 실시예에 의한 프로세서(121)는, 수집 정보를 처리 한 것에 기초하여, 판매량(71)에 대한 항목이 결측값으로 식별된 것에 응답하여, 상기 수집 정보에 대응되는 상품명과 동일한 상품으로 판단된 상품 정보의 제1 항목 정보의 판매량에 대한 데이터를 기초로 보정값을 산출할 수 있다.Referring to FIGS. 6 and 7, a processor (121) according to one embodiment of the present disclosure, in response to an item regarding sales volume (71) being identified as a missing value based on processing of collected information, may calculate a correction value based on data regarding sales volume of a first item of product information determined to be the same product as the product name corresponding to the collected information.

예시적으로, 프로세서(121)가 웹페이지 데이터를 처리한 것에 기초하여, 식별된 수집 정보 중 판매량에 관한 데이터가 결측값으로 식별된 경우, 상술한 결측값 보정 프로세스와 상이한 결측값 보정 프로세스가 적용될 수 있다. 다만, 이에 한정되는 것은 아니고, 상술한 결측값 보정 프로세스가 적용될 수 있음을 이해할 수 있다.For example, if data on sales volume among the identified collected information is identified as a missing value based on the processor (121) processing web page data, a missing value correction process different from the above-described missing value correction process may be applied. However, it is not limited thereto, and it can be understood that the above-described missing value correction process may be applied.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는 웹페이지 데이터를 처리한 것에 기초하여, 식별된 수집 정보 중 결측값을 식별하는 것뿐만 아니라, 중복값을 식별하고, 상기 중복값(중복 데이터)를 삭제할 수 있다. 예시적으로, 웹페이지 데이터 내에는 특정 상품명 및 상기 상품명에 대응되는 항목 정보가 중복되어(예를 들어, 동일한 상품명에 대한 데이터가 복수 개 식별된 경우 등) 식별되는 경우, 데이터의 중복적인 저장을 방지하기 위함 일 수 있다.Meanwhile, the processor (121) according to one embodiment of the present disclosure can identify missing values among the identified collection information based on processing web page data, as well as identify duplicate values and delete the duplicate values (duplicate data). For example, in the web page data, if a specific product name and item information corresponding to the product name are identified as being duplicated (for example, if multiple pieces of data for the same product name are identified), this can be used to prevent duplicate storage of data.

보다 상세하게, 웹페이지 데이터 중 제1 페이지 내에 제1 상품과 상기 제1 상품에 대응되는 항목 정보를 포함하고, 상기 웹페이지 데이터 중 제2 페이지 내에 제1 상품과 상기 제1 상품에 대응되는 항목 정보를 포함하는 경우에 있어서, 프로세서(121)는, 웹페이지 데이터를 처리한 것에 기초하여, 두 개의 제1 상품 및 상기 제1 상품에 대응되는 항목 정보를 포함하는 수집 정보를 식별할 수 있다. 이에 따라서, 프로세서(121)는 두 개의 제1 상품명과 상기 상품명에 대응되는 항목 정보 중 하나의 제1 상품명과 상기 상품명에 대응되는 항목 정보를 삭제할 수 있다. 즉,프로세서(121)는, 중복되는 데이터를 필터링 및/또는 삭제할 수 있다.More specifically, in a case where the webpage data includes a first product and item information corresponding to the first product within a first page, and the webpage data includes a first product and item information corresponding to the first product within a second page, the processor (121) may identify, based on the webpage data processed, collection information including two first products and item information corresponding to the first product. Accordingly, the processor (121) may delete one of the two first product names and the item information corresponding to the product names, the first product name and the item information corresponding to the product name. That is, the processor (121) may filter and/or delete duplicate data.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 웹페이지 데이터를 처리한 것에 기초하여 식별된 수집 정보를 기초로 이상값(outlier)을 식별할 수 있다.Meanwhile, the processor (121) according to one embodiment of the present disclosure can identify an outlier based on collected information identified based on processing web page data.

여기에서, 이상값(outlier)은, 예를 들어, 수집 정보에 포함된 상품명 및 상기 상품명에 대응되는 항목 정보 중 순위(rank), 가격(price), 리뷰 정보(예를 들어, 리뷰 수/day 등) 등의 값이 평균과의 편차가 일반적이지 않는 경우의 값을 의미하며, 이하에서 설명하는 판매량 예측 모델에 입력되는 데이터 중 예측 판매량과 상관 관계가 높은 데이터에 대하여 높은 신뢰도를 가지고 오류가 존재하는 데이터를 의미할 수 있다. 다만, 이에 한정되는 것은 아니다. Here, an outlier means a value in which, for example, the product name included in the collected information and the item information corresponding to the product name, such as rank, price, and review information (e.g., number of reviews/day, etc.), have an unusual deviation from the average, and may mean data with high reliability and errors in data that are highly correlated with the predicted sales volume among the data input to the sales volume prediction model described below. However, it is not limited thereto.

예시적으로, 프로세서(121)는, 기저장된 상품명 및 상기 상품명에 대응되는 항목 정보를 포함하는 상품 정보 및 웹페이지 데이터를 처리한 것에 기초하여 식별된 수집 정보를 처리한 것에 기초하여, 이상값을 식별할 수 있다. 보다 구체적으로, 프로세서(121)는, 상품 정보를 처리한 것에 기초하여, 각각의 상품명에 대응되는 항목 정보의 항목들 각각의 IQR(interquartile Range)을 추출(식별 또는 출력)할 수 있다. 즉, 프로세서(121)는, 상품 정보를 처리한 것에 기초하여, 시계열적 데이터인 상품 정보의 상품명 및 상기 상품명에 대응되는 항목 정보 중 각각의 항목에 대응되는 데이터(값)의 크기순으로 25% 지점(Q1)과 75% 지점(Q3)를 식별하고, 식별된 Q1 및 Q3를 기초로 (예를 들어, Q3 - Q1 = IQR 수식을 통해) IQR을 식별할 수 있다.For example, the processor (121) may identify an outlier based on the processing of collected information identified based on the processing of product information and web page data including a stored product name and item information corresponding to the product name. More specifically, the processor (121) may extract (identify or output) the IQR (interquartile range) of each item of item information corresponding to each product name based on the processing of the product information. That is, the processor (121) may identify the 25% point (Q1) and the 75% point (Q3) in order of the size of the data (value) corresponding to each item among the product name of the product information which is time-series data and the item information corresponding to the product name based on the processing of the product information, and may identify the IQR based on the identified Q1 and Q3 (for example, through the formula Q3 - Q1 = IQR).

예시적으로, 프로세서(121)가, 상품 정보 중 제1 상품(특정 상품)의 항목 정보 중 가격에 대한 IQR을 식별하는 경우에 있어서, 일자 별 가격에 대한 크기 순서대로 25% 지점(Q1)과 75% 지점(Q3)의 가격을 기초로 IQR을 식별할 수 있다. 다만, 이에 한정되는 것은 아니고, 항목 정보에 포함된 각 항목에 대한 IQR을 식별할 수 있다.For example, when the processor (121) identifies the IQR for the price among the item information of the first product (a specific product) among the product information, the IQR can be identified based on the price at the 25% point (Q1) and the 75% point (Q3) in the order of the size for the price by date. However, the present invention is not limited thereto, and the IQR for each item included in the item information can be identified.

이에 따라, 프로세서(121)는, 식별된 IQR을 기초로, 웹페이지 데이터를 처리한 것을 기초로 식별된 수집 정보의 이상값을 식별할 수 있다.Accordingly, the processor (121) can identify an outlier in the collected information identified based on processing the web page data based on the identified IQR.

보다 상세하게, 프로세서(121)는, 상품 정보에 포함된 항목 정보의 각 항목에 대한 상한 값(예를 들어, Q3 + 1.5*IQR = 상한값) 및 하한 값(예를 들어, Q1 - 1.5*IQR = 하한값)을 식별할 수 있다. 상술한 식은 예시적인 것으로서, IQR에 대한 미리 정해진 가중치에 기초하여, 상한값 및 하한값이 결정될 수 있음을 이해할 수 있다.More specifically, the processor (121) can identify an upper limit value (e.g., Q3 + 1.5*IQR = upper limit value) and a lower limit value (e.g., Q1 - 1.5*IQR = lower limit value) for each item of item information included in the product information. It can be understood that the above-described formula is exemplary, and the upper limit value and the lower limit value can be determined based on a predetermined weight for the IQR.

상술한 실시예에서, 프로세서(121)는, 웹페이지 데이터를 처리한 것에 기초하여 식별된 수집 정보에 포함된 복수의 상품명과 그에 대응되는 항목 정보 중, 특정 상품명에 대응되는 항목 정보와 상품 정보를 처리한 것에 기초하여 미리 식별된 상기 특정 상품명에 대응되는 항목 정보의 각 항목에 대한 상한값 및/또는 하한값에 기초하여 항목에 대한 이상값을 식별할 수 있다.In the above-described embodiment, the processor (121) can identify an abnormal value for an item based on an upper limit and/or a lower limit for each item of item information corresponding to a specific product name identified in advance based on processing product information and item information corresponding to a specific product name among a plurality of product names and corresponding item information included in the collected information identified based on processing web page data.

예시적으로, 프로세서(121)는, 상품 정보를 처리한 것에 기초하여, 제1 상품에 대한 항목 정보 중 제1 항목에 대한 상한값이 제1 상한 및 하한값이 제1 하한으로 식별된 것에 응답하여, 수집 정보를 처리한 것에 기초하여, 상기 수집 정보에 포함된 제1 상품에 대한 제1 항목의 데이터가 상기 제1 상한 보다 크거나 또는, 상기 제1 하한보다 작은 것에 응답하여, 상기 제1 항목의 이상값을 식별할 수 있다. 이와 반대로, 프로세서(121)는, 수집 정보에 포함된 상기 제1 항목의 데이터가 상기 제1 상한과 상기 제1 하한 내에 인 것에 응답하여, 정상값으로 식별할 수 있다.For example, the processor (121) may identify an abnormal value of the first item in response to the fact that, based on processing the product information, an upper limit value of the first item among the item information for the first product is identified as the first upper limit and a lower limit value is identified as the first lower limit, and based on processing the collected information, in response to the fact that, based on processing the collected information, data of the first item for the first product included in the collected information is greater than the first upper limit or less than the first lower limit. Conversely, the processor (121) may identify a normal value in response to the fact that the data of the first item included in the collected information is within the first upper limit and the first lower limit.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 웹페이지 데이터를 처리한 것에 기초하여 식별(획득)한 수집 정보 중 특정 상품에 대응되는 특정 항목의 데이터가 이상값으로 식별된 것에 응답하여, 상기 특정 항목의 데이터를 보정할 수 있다.Meanwhile, a processor (121) according to one embodiment of the present disclosure may correct data of a specific item corresponding to a specific product in response to data of the specific item being identified as an abnormal value among the collected information identified (acquired) based on processing web page data.

보다 상세하게, 프로세서(121)는, 특정 상품에 대응되는 이상값으로 식별된 값들을 대상으로 K-평균 클러스터링(K-means clustering)을 수행할 수 있다. 예시적으로 K=2(2개의 클러스터로 나눔)로 이상값으로 식별된 값들을 대상으로 k-means clustering을 수행할 수 있다. 이 경우, k=2로 고정되므로, 두 clutser의 평균값이 높은 cluster를 메이저 클러스터(major cluster)로, 나머지 cluster를 마이너 클러스터(minor cluster)로 명명할 수 있다. In more detail, the processor (121) can perform K-means clustering on values identified as outliers corresponding to a specific product. For example, k-means clustering can be performed on values identified as outliers with K=2 (divided into two clusters). In this case, since k is fixed to 2, the cluster with a higher average value of the two clusters can be named a major cluster, and the remaining clusters can be named a minor cluster.

이에 따라, 프로세서(121)는, 메이저 클러스터에 속하는 이상값을 메이저 클러스터의 최대값(max)으로 나눈 후 Q3를 곱하여 보정값을 식별하여, 상기 이상값이 식별된 특정 상품에 대응되는 항목의 데이터를 상기 보정값으로 보정할 수 있다. 또한, 프로세서(121)는, 마이너 클러스터에 속하는 이상값을 마이너 클러스터의 최대값(max)으로 나눈 후 Q3를 곱하여 보정값을 식별하여, 상기 이상값이 식별된 특정 상품에 대응되는 항목의 데이터를 상기 보정값으로 보정할 수 있다. 다만, 이에 한정되는 것은 아니다.Accordingly, the processor (121) can identify a correction value by dividing an abnormal value belonging to a major cluster by the maximum value (max) of the major cluster and then multiplying the result by Q3, and correct the data of an item corresponding to a specific product for which the abnormal value has been identified with the correction value. In addition, the processor (121) can identify a correction value by dividing an abnormal value belonging to a minor cluster by the maximum value (max) of the minor cluster and then multiplying the result by Q3, and correct the data of an item corresponding to a specific product for which the abnormal value has been identified with the correction value. However, the present invention is not limited thereto.

다른 실시예로, 프로세서(121)는, 수집 정보에 포함된 특정 상품의 이상값으로 식별된 항목의 데이터를, 특정 값(예를 들어, Q3, Q2(50% 지점), Q1)으로 보정할 수 있다. 또 다른 실시예로, 프로세서(121)는, 수집 정보에 포함된 특정 상품의 이상값으로 식별된 항목의 데이터를 Q1 내지 Q3의 범위 중 랜덤 변수 생성을 통해 생성된 값을 기초로 보정할 수 있다. 다만, 이에 한정되는 것은 아니다.In another embodiment, the processor (121) may correct the data of an item identified as an abnormal value of a specific product included in the collection information to a specific value (e.g., Q3, Q2 (50% point), Q1). In another embodiment, the processor (121) may correct the data of an item identified as an abnormal value of a specific product included in the collection information based on a value generated through random variable generation within the range of Q1 to Q3. However, the present invention is not limited thereto.

도 8은 일 실시예에 의한 전자 장치의 판매량 예측 모델을 설명하기 위한 도면이다.FIG. 8 is a diagram for explaining a sales volume prediction model of an electronic device according to one embodiment.

도 8을 참조하면, 본 개시의 일 실시예에 의한 프로세서(121)는, 웹페이지 데이터를 처리한 것에 기초하여 식별된 수집 정보의 제2 항목 정보 중 적어도 하나의 결측값을 식별한 것에 응답하여, 결측값 보정 프로세스를 수행할 수 있다. 보다 구체적으로, 프로세서(121)는, 식별된 수집 결측값이 판매량에 관한 항목인 것에 응답하여, 상기 결측값이 식별된 상품명에 대응되는 상품 정보(21) 중 제1 항목 정보를 판매량 예측 모델(81, 82)에 입력하여, 예측 판매량을 출력할 수 있다. 이에 따라서, 출력된 예측 판매량(83)을 기초로 보정값을 식별하고, 상기 보정값을 기초로 상기 제2 항목 정보의 결측값을 보정할 수 있다.Referring to FIG. 8, a processor (121) according to an embodiment of the present disclosure may perform a missing value correction process in response to identifying at least one missing value among second item information of collected information identified based on processing web page data. More specifically, in response to the identified collected missing value being an item regarding sales volume, the processor (121) may input first item information among product information (21) corresponding to the product name for which the missing value is identified into a sales volume prediction model (81, 82), thereby outputting a predicted sales volume. Accordingly, a correction value may be identified based on the output predicted sales volume (83), and the missing value of the second item information may be corrected based on the correction value.

즉, 프로세서(121)는, 결측값이 발생한 수집 정보에 대응되는 상품명에 관한 획득 시점에 따른 시게열적 데이터인 상품 정보를 판매량 예측 모델에 입력으로, 상기 결측값이 포함된 수집 정보를 획득한 획득 시점의 판매량을 예측하여 상기 예측된 판매량을 보정값으로 식별하고, 상기 보정값을 기초로 결측값을 보정할 수 있다.That is, the processor (121) inputs product information, which is chronological data according to the acquisition time of the product name corresponding to the collection information where the missing value occurs, into a sales prediction model, predicts the sales amount at the acquisition time when the collection information including the missing value is acquired, identifies the predicted sales amount as a correction value, and corrects the missing value based on the correction value.

보다 상세하게, 프로세서(121),는, 결측값이 발생한 수집 정보에 대응되는 상품명에 관한 획득 시점에 따른 시계열적 데이터인 상품 정보에 포함된 항목 정보 및 리뷰 정보를 판매량 예측 모델에 입력으로, 상기 결측값이 포함된 수집 정보를 획득한 획득 시점의 판매량을 예측하여 상기 예측된 판매량을 보정값으로 식별하여 상기 보정 값을 기초로 상기 결측값을 보정할 수 있다.In more detail, the processor (121) inputs item information and review information included in product information, which are time-series data according to the acquisition time of the product name corresponding to the collected information where the missing value occurs, into a sales prediction model, predicts the sales volume at the acquisition time when the collected information including the missing value is acquired, identifies the predicted sales volume as a correction value, and corrects the missing value based on the correction value.

프로세서(121)는, 각종 데이터를 처리하는 적어도 하나의 프로세서, 예를 들어, 복수 개의 프로세서를 포함할 수 있다. 예시적으로, 프로세서(121)는, 예를 들어, 머신러닝을 위한 러닝 프로세서를 포함하여, 학습 데이터를 이용하여 인공 신경망으로 구성된 모델을 학습시킬 수 있다. 여기서 학습된 인공 신경망을 학습 모델이라 칭할 수 있다.학습 모델은 학습 데이터가 아닌 새로운 입력 데이터에 대하여 결과 값을 추론해 내는데 사용될 수 있고, 추론된 값은 어떠한 동작을 수행하기 위한 판단의 기초로 이용될 수 있다.The processor (121) may include at least one processor, for example, a plurality of processors, for processing various types of data. For example, the processor (121) may include, for example, a learning processor for machine learning, and may train a model composed of an artificial neural network using learning data. Here, the trained artificial neural network may be referred to as a learning model. The learning model may be used to infer a result value for new input data other than learning data, and the inferred value may be used as a basis for judgment for performing a certain operation.

일 실시예에 의한 프로세서(121)는 딥 러닝 (deep learning) 알고리즘을 통해 학습 모델을 학습시킬 수 있다. 보다 구체적으로 프로세서(121)는, TCN(Temporally Convolutional Network) 또는 LSTM(Long Short-Term Memory) 알고리즘 또는 XGBoost 알고리즘을 통해 학습 모델을 학습시킬 수 있다.The processor (121) according to one embodiment can train a learning model through a deep learning algorithm. More specifically, the processor (121) can train a learning model through a Temporarily Convolutional Network (TCN) or a Long Short-Term Memory (LSTM) algorithm or an XGBoost algorithm.

딥 러닝 (deep learning) 알고리즘은 머신 러닝(Machine learning) 알고리즘의 하나로 인간의 신경망을 본뜬 인공 신경망에서 발전된 모델링 기법을 의미한다. 인공 신경망은 다층 계층 구조로 구성될 수 있다.Deep learning algorithm is a type of machine learning algorithm, and refers to a modeling technique developed from an artificial neural network modeled after a human neural network. An artificial neural network can be structured into a multi-layer hierarchical structure.

인공 신경망(Artifical Neural Network; ANN)은 입력 층, 출력 층, 그리고 상기 입력 층과 출력 층 사이에 적어도 하나 이상의 중간 층 (또는 은닉 층, Hidden layer)(예를 들어 커널(Kernel))을 포함하는 계층 구조로 구성될 수 있다. 딥러닝 알고리즘은, 이와 같은 다중 계층 구조에 기반하여, 층간 활성화 함수(Activation function)의 가중치를 최적화(Optimization)하는 학습을 통해 결과적으로 신뢰성 높은 결과를 도출할 수 있다.An artificial neural network (ANN) can be composed of a hierarchical structure including an input layer, an output layer, and at least one intermediate layer (or hidden layer) (e.g., a kernel) between the input layer and the output layer. A deep learning algorithm can produce reliable results as a result of learning to optimize the weights of the activation function between layers based on this multi-layer structure.

본 개시의 일 실시예에 의한 프로세서(121)에 적용 가능한 딥러닝 알고리즘은 예를 들어, 합성곱 신경망(Convolutional Neural Network; CNN) 또는 순환 신경망(Recurrent Neural Network; RNN) 등을 포함할 수 있으나 이에 한정되는 것은 아니다. 실시예에 따라 다른 딥러닝 알고리즘이 적용될 수 있음을 이해할 수 있다.Deep learning algorithms applicable to the processor (121) according to one embodiment of the present disclosure may include, but are not limited to, a convolutional neural network (CNN) or a recurrent neural network (RNN). It can be understood that different deep learning algorithms may be applied depending on the embodiment.

합성곱 신경망(Convolutional Neural Network; CNN)은 기존의 데이터에서 지식을 추출하여 학습 과정이 수행되는 기법과 달리, 데이터의 특징을 추출하여 특징들의 패턴을 파악하는 구조를 갖는 것을 특징으로 한다. 상기 합성곱 신경망(Convolutional Neural Network; CNN)은 콘볼루션(Convolution) 과정과 풀링(Pooling) 과정을 통해 수행될 수 있다. 다시 말해, 상기 합성곱 신경망(Convolutional Neural Network; CNN)은 콘볼루션 층과 풀링 층이 복합적으로 구성된 알고리즘을 포함할 수 있다. 여기서, 콘볼루션 층에서는 데이터의 특징을 추출하는 과정(에를 들어, 합성곱 과정(콘볼루션 과정))이 수행된다. 상기 콘볼루션 과정은 데이터에 각 성분의 인접 성분들을 조사해 특징을 파악하고 파악한 특징을 한장으로 도출하는 과정으로써, 하나의 압축 과정으로써 파라미터의 개수를 효과적으로 줄일 수 있다. 풀링 층에서는 콘볼루션 과정을 거친 레이어의 사이즈를 줄여주는 과정(예를 들어, 풀링 과정)이 수행된다. 상기 풀링 과정은 데이터의 사이즈를 줄이고 노이즈를 상쇄시키고 미세한 부분에서 일관적인 특징을 제공할 수 있다. 일 예로, 상기 합성곱 신경망(Convolutional Neural Network; CNN)은 정보 추출, 문장 분류, 얼굴 인식 등 여러 분야에 활용될 수 있다. 한편, 합성곱 신경망(Convolutional Neural Network; CNN)은 기공지된 기술이므로, 자세한 설명은 이하 생략한다. Convolutional Neural Network (CNN) is characterized by having a structure that extracts data features and identifies patterns of features, unlike techniques that extract knowledge from existing data and perform a learning process. The convolutional neural network (CNN) can be performed through a convolution process and a pooling process. In other words, the convolutional neural network (CNN) can include an algorithm that is configured by combining a convolution layer and a pooling layer. Here, in the convolution layer, a process of extracting data features (e.g., a convolution process (convolution process)) is performed. The convolution process is a process of investigating adjacent components of each component in the data to identify features and deriving the identified features in one sheet, which can effectively reduce the number of parameters as a compression process. In the pooling layer, a process of reducing the size of a layer that has undergone a convolution process (e.g., a pooling process) is performed. The above pooling process can reduce the size of data, offset noise, and provide consistent features in fine details. For example, the above convolutional neural network (CNN) can be used in various fields such as information extraction, sentence classification, and face recognition. Meanwhile, since the convolutional neural network (CNN) is a publicly known technology, a detailed description will be omitted below.

이에 따라서, 프로세서(121)는, 제2 항목 정보 중 결측값이 발생한 상품명에 대응한 상품 정보를 판매량 예측 모델에 입력으로, 상기 결측값이 포함된 수집 정보를 획득한 획득 시점의 예측 판매량을 출력하고, 상기 출력된 예측 판매량을 보정값으로 식별할 수 있다.Accordingly, the processor (121) inputs product information corresponding to a product name in which a missing value occurs among the second item information into a sales prediction model, outputs a predicted sales amount at the time of acquisition when the collected information including the missing value is acquired, and identifies the output predicted sales amount as a correction value.

일 실시예에 따른, 판매량 예측 모델은, 예를 들어, 카테고리에 따른 그룹 별로 학습된 인공지능 모델일 수 있다. 즉, 판매량 예측 모델은, 카테고리에 따른 복수의 그룹 별로 학습된 인공지능 모델로서, 각각의 카테고리의 그룹에 포함된 복수의 상품명과 상기 상품명에 대응하는 제1 항목 정보를 기초로 학습될 수 있다. 즉, 판매량 예측 모델은, 카테고리에 대응되는 복수의 상품명과 상기 상품명 각각에 대응되는 제1 항목 정보를 입력으로, 카테고리에 특화된 예측 판매량을 출력하도록 학습된 모델일 수 있다.According to one embodiment, the sales prediction model may be an artificial intelligence model learned for each group according to a category, for example. That is, the sales prediction model may be an artificial intelligence model learned for each group according to a category, and may be learned based on a plurality of product names included in each group of each category and first item information corresponding to the product names. That is, the sales prediction model may be a model learned to output a predicted sales volume specialized for a category, by taking as input a plurality of product names corresponding to a category and first item information corresponding to each of the product names.

다른 실시예로, 판매량 예측 모델은, 상품명 별로 학습된 인공지능 모델일 수 있다. 즉, 각각의 상품명과 상기 상품명에 대응하는 제1 항목 정보를 기초로 학습되어 특정 상품(상품명)에 따라 특징적 상관관계를 학습하여, 특정 상품에 특화된 판매량을 예측할 수 있다.In another embodiment, the sales volume prediction model may be an artificial intelligence model learned by product name. That is, it learns based on each product name and the first item information corresponding to the product name, learns a characteristic correlation according to a specific product (product name), and can predict sales volume specialized for a specific product.

보다 상세하게, 판매량 예측 모델은, 각각의 상품명과 상기 상품명에 대응하는 특정 기간에 따른 제1 항목 정보를 기초로 학습되어, 특정 상품(상품명, 상품)ID)에 따라 특징적 상관 관계를 학습하여, 특정 상품과 특정 기간에 특화된 판매량을 예측할 수 있다. 예시적으로, 특정 기간은 예를 들어, 28일 간의 상품 정보를 의미할 수 있으나, 이에 한정되는 것은 아니다.In more detail, the sales volume prediction model is learned based on the first item information for each product name and a specific period corresponding to the product name, and learns a characteristic correlation according to a specific product (product name, product) ID, so that it can predict sales volume specialized for a specific product and a specific period. For example, the specific period may mean product information for 28 days, but is not limited thereto.

한편, 본 개시의 일 실시예에 따른 판매량 예측 모델은, 리뷰 정보를 포함하는 상품 정보 및 이하에서 자세히 서술할 브랜드 정보를 입력으로, 예측 판매량을 출력하도록 학습된 모델일 수 있다. Meanwhile, a sales volume prediction model according to one embodiment of the present disclosure may be a model trained to output a predicted sales volume by taking as input product information including review information and brand information, which will be described in detail below.

한편, 본 개시의 일 실시예에 따른 판매량 예측 모델은, 특정 상품의 속성에 따라 다른 트렌드를 구별하여 학습하기 위하여 임베딩 레이더(embedding layer)를 더 포함할 수 있다. 즉, 판매량 예측 모델은, 특정 상품에 대한 상품 정보 및/또는 특정 상품에 대한 재고 정보를 포함하는 브랜드 정보를 기초로 학습하여 특정 상품에 대한 높은 예측 정확도를 갖도록 하는 것 외에, 임베딩 레이어를 더 포함시킴으로써, 특정 상품에 대한 상품 정보 및/또는 브랜드 정보를 구분하여 학습함으로써, 단일 판매량 예측 모델에 특정 상품에 대한 상품 정보 및/또는 브랜드 정보를 입력하는 경우, 다른 상품들의 트랜드(속성)과 구별하여 결과값을 출력할 수 있다. 이에 따라서, 판매량 예측 모델은 임베딩 레이어를 포함하여, 특정 상품에 대한 데이터적 의미적 유사성을 잘 캡처(capture)하여 의미를 학습할 수 있다.Meanwhile, the sales volume prediction model according to one embodiment of the present disclosure may further include an embedding layer in order to learn by distinguishing different trends according to the attributes of a specific product. That is, in addition to having high prediction accuracy for a specific product by learning based on product information for a specific product and/or brand information including inventory information for a specific product, the sales volume prediction model further includes an embedding layer to distinguish and learn product information and/or brand information for a specific product, so that when product information and/or brand information for a specific product is input to a single sales volume prediction model, the result value can be output by distinguishing it from the trends (attributes) of other products. Accordingly, the sales volume prediction model can learn meaning by capturing data-semantic similarities for a specific product well by including an embedding layer.

즉, 판매량 예측 모델은, 특정 상품의 특징 또는 기간에 따른 특성을 학습함으로써, 특정 상품에 대한 상품 정보 및/또는 브랜드 정보가 입력되는 경우, 해당 특정 상품에 대한 예측 판매량을 보다 정확도 높게 출력할 수 있다. 다만, 이에 한정되는 것은 아니다.That is, the sales volume prediction model can output a predicted sales volume for a specific product with higher accuracy when product information and/or brand information for a specific product is input by learning the characteristics of a specific product or the characteristics according to a period. However, it is not limited thereto.

한편, 본 개시의 일 실시예에 의한 판매량 예측 모델은, 앙상블 모델로서, 복수의 모델의 출력값을 입력으로 예측 판매량을 출력하도록 학습된 모델일 수 있다. 즉, 판매량 예측 모델은, 각기 다른 알고리즘을 활용하는 모델을 동일한 입력과 출력을 갖도록 학습 데이터(로우 데이터)를 입력하여 학습시킨 적어도 두 개의 판매량 예측 모델들로 구성될 수 있다. 이에 따라서, 판매량 예측 모델은 예시적으로, 제1 판매량 예측 모델 및 제2 판매량 예측 모델의 출력을 입력으로 예측 판매량을 출력하도록 학습된 제3 판매량 예측 모델을 포함할 수 있다. 다만, 이에 한정되는 것은 아니다.Meanwhile, the sales prediction model according to one embodiment of the present disclosure may be an ensemble model, which is a model learned to output predicted sales by taking as input the output values of multiple models. That is, the sales prediction model may be composed of at least two sales prediction models that are learned by inputting learning data (raw data) so that models utilizing different algorithms have the same input and output. Accordingly, the sales prediction model may include, for example, a third sales prediction model learned to take as input the outputs of the first sales prediction model and the second sales prediction model and output the predicted sales. However, the present invention is not limited thereto.

다른 실시예로, 제1 판매량 예측 모델, 제2 판매량 예측 모델 및 제3 판매량 예측 모델의 출력을 입력으로, 예측 판매량을 출력하도록 학습된 제4 판매량 예측 모델을 포함할 수 있다.In another embodiment, the fourth sales prediction model may be trained to take as input the outputs of the first sales prediction model, the second sales prediction model, and the third sales prediction model and output a predicted sales amount.

다른 실시예로, 도 8을 참조하면, 본 개시의 일 실시예에 의한 프로세서(121)는, 상품 정보(21)를 제1 판매량 예측 모델(81)에 입력하여 제1 예측 판매량을 출력하고, 상기 상품 정보(21)를 제2 판매량 예측 모델(82)에 입력하여 제2 예측 판매량을 출력할 수 있다.In another embodiment, referring to FIG. 8, the processor (121) according to one embodiment of the present disclosure may input product information (21) into a first sales prediction model (81) to output a first predicted sales amount, and input the product information (21) into a second sales prediction model (82) to output a second predicted sales amount.

이에 따라, 프로세서(121)는, 출력된 제1 예측 판매량 및 제2 예측 판매량에 기초하여 예측 판매량을 식별(출력)할 수 있다.Accordingly, the processor (121) can identify (output) the predicted sales amount based on the output first predicted sales amount and second predicted sales amount.

일 실시예에 의한 제1 판매량 예측 모델(81)은, TCN(Temporal convolutional Network) 알고리즘을 기초로 학습된 모델이며, 제2 판매량 예측 모델(82)는 LSTM(Long Short Term Memory) 알고리즘을 기초로 학습된 모델이다. 다만, 이에 한정되는 것은 아니다. 한편, 다른 실시예로, 제1 내지 제3 판매량 예측 모델 중 어느 하나는 예를 들어, XGBoost(Extreme Gradient Boosting) 알고리즘을 기초로 학습된 모델 일 수 있다.본 개시의 일 실시예에 의한 TCN 알고리즘에 및 LSTM 알고리즘은 연속적인(시계열적인) 데이터를 처리하기 위한 딥러닝 모델이다. 보다 상세하게, TCN은 컨볼루션 신경망(CNN)을 시퀀스 데이터에 적용한 모델로, 시간적 의존성을 학습하기 위해 설계된 알고리즘이다. TCN은 임의의 길이의 시퀀스를 처리하고, 긴 시간적 의존성을 효과적으로 학습할 수 있으며, 일반적으로 LSTM보다 더 빠른 학습과 병렬 처리를 제공할 수 있다. TCN의 주요 특징은 컨볼루션 레이어를 사용하여 입력 시퀀스의 패턴을 학습하며, 깊이를 높이는 것을 통해 긴 시퀀스 의존성 즉, 시간과 데이터 간의 상관 관계를 확인할 수 있다. According to one embodiment, the first sales prediction model (81) is a model learned based on a Temporal Convolutional Network (TCN) algorithm, and the second sales prediction model (82) is a model learned based on a Long Short Term Memory (LSTM) algorithm. However, the present invention is not limited thereto. Meanwhile, in another embodiment, any one of the first to third sales prediction models may be a model learned based on, for example, an Extreme Gradient Boosting (XGBoost) algorithm. The TCN algorithm and the LSTM algorithm according to one embodiment of the present invention are deep learning models for processing continuous (time-series) data. More specifically, the TCN is a model that applies a convolutional neural network (CNN) to sequence data, and is an algorithm designed to learn temporal dependency. The TCN can process sequences of arbitrary length, effectively learn long temporal dependencies, and generally provide faster learning and parallel processing than the LSTM. The main feature of TCN is that it learns the patterns of the input sequence using convolutional layers, and by increasing the depth, it can identify long sequence dependencies, that is, correlations between time and data.

또한, LSTM은 순환형 신경망(RNN)의 한 종류로, 시퀀스 데이터와 시계열 데이터를 처리하는 데 주로 사용된다. 보다 구체적으로, LSTM은 "단기 기억"과 "장기 기억"을 관리하는 내부 상태를 가지고 있어, 긴 시간 동안의 의존성 즉, 시간과 데이터 간의 상관 관계를 쉽게 학습할 수 있다. In addition, LSTM is a type of recurrent neural network (RNN) and is mainly used to process sequence data and time series data. More specifically, LSTM has an internal state that manages "short-term memory" and "long-term memory", so it can easily learn long-term dependencies, that is, correlations between time and data.

본 개시의 일 실시예에 의한 프로세서(121)는, 제1 판매량 예측 모델(81), 제2 판매량 예측 모델(82) 및 제3 판매량 예측 모델(미도시)을 상술한 판매량 예측 모델과 같이 동일한 입력(input)과 출력(output)을 갖도록 학습시킬 수 있다. 다른 실시예로, XGBoost 알고리즘을 기초로 학습된 제3 판매량 예측 모델은, 제1 및 제2 판매량 예측 모델과 상이하게, 특정 상품의 시간(기간)에 따른 판매량만을 입력으로 학습된 것일 수 있다.The processor (121) according to one embodiment of the present disclosure can train the first sales prediction model (81), the second sales prediction model (82), and the third sales prediction model (not shown) to have the same input and output as the above-described sales prediction models. In another embodiment, the third sales prediction model trained based on the XGBoost algorithm may be trained with only the sales amount according to time (period) of a specific product as input, unlike the first and second sales prediction models.

또 다른 실시예로, XGBoost 알고리즘을 기초로 학습된 판매량 예측 모델은, 특정 상품의 상품 정보에 포함된 가격, 평균 평점(또는 기준 평점의 평균), 누적 리뷰수, 찜하기, 랭킹 및 판매량을 기초로 학습되어, 특정 상품의 가격, 평균 평점(또는 기준 평점의 평균), 누적 리뷰수 및 찜하기 수를 입력으로 예측 판매량을 출력할 수 있다.In another embodiment, a sales prediction model learned based on the XGBoost algorithm is learned based on the price, average rating (or average of standard ratings), cumulative number of reviews, number of likes, ranking, and sales included in the product information of a specific product, and can output a predicted sales volume by taking the price, average rating (or average of standard ratings), cumulative number of reviews, and number of likes of a specific product as input.

보다 구체적으로, 프로세서(121)는 상품 정보의 제1 항목 정보에 관한 로우 데이터(여기에서 로우 데이터는 학습 데이터로 이해할 수 있음)을 기반으로 데이터를 분석 및 처리하여 제1 판매량 예측 모델, 제2 판매량 예측 모델 및 제3 판매량 예측 모델을 구축할 수 있다. 즉, 프로세서(121)는, 제1 판매량 예측 모델 또는 제2 판매량 예측 모델 또는 제3 판매량 예측 모델을 상품 정보의 제1 항목 정보를 입력하는 경우 예측 일자 별 예측 판매량에 관한 시계열 데이터를 출력하도록 학습시킬 수 있다.한편, TCN을 통해 학습된 제1 판매량 예측 모델은 예측 일자 별 정밀한 예측이 가능하며, LSTM의 경우, 정밀한 일자별 예측보다는 트랜드(trent)(예를 들어, 상승, 유지, 하락) 예측에 중점적으로 예측하게 된다. XGBoost를 통해 학습된 제3 판매량 예측 모델은, 적은 로우 데이터(학습 데이터)로도 높은 정확도를 갖고 예측하게 된다.More specifically, the processor (121) can analyze and process data based on raw data regarding the first item information of the product information (here, the raw data can be understood as learning data) to build a first sales prediction model, a second sales prediction model, and a third sales prediction model. That is, the processor (121) can train the first sales prediction model, the second sales prediction model, or the third sales prediction model to output time series data regarding the predicted sales for each predicted date when the first item information of the product information is input. Meanwhile, the first sales prediction model learned through TCN is capable of precise prediction for each predicted date, and in the case of LSTM, prediction is focused on predicting trends (e.g., rising, maintaining, falling) rather than precise prediction for each day. The third sales prediction model learned through XGBoost makes predictions with high accuracy even with a small amount of raw data (learning data).

이러한 특성에 기초하여, 본 개시의 일 실시예에 의한 프로세서(121)는, 특정 상품에 관한 상품 정보를 제1 판매량 예측 모델 및 제2 판매량 예측 모델에 입력하여 출력된 제1 예측 판매량 및 제2 예측 판매량에 가중치를 부여하고, 상기 가중치가 부여된 제1 예측 판매량 및 제2 예측 판매량을 기초로 (최종) 예측 판매량(83)을 출력(식별)할 수 있다. Based on these characteristics, the processor (121) according to one embodiment of the present disclosure can input product information regarding a specific product into a first sales prediction model and a second sales prediction model, assign weights to the output first predicted sales amount and second predicted sales amount, and output (identify) a (final) predicted sales amount (83) based on the first predicted sales amount and second predicted sales amount to which the weights are assigned.

보다 상세하게, 프로세서(121)는, 예측하고자 하는 예측 일과 현재 날짜 간의 기간인 예측 기간에 반비례하여, 상술한 제1 예측 판매량의 가중치를 부여하고, 상기 예측 기간에 비례하여, 상술한 제2 예측 판매량의 가중치를 부여할 수 있다. 이에 따라, 프로세서(121)는 상기 가중치가 부여된 제1 예측 판매량 및 제2 예측 판매량을 기초로 (최종) 예측 판매량(83)을 출력(식별)할 수 있다.More specifically, the processor (121) can assign a weight to the first predicted sales amount inversely proportional to the prediction period, which is the period between the predicted date to be predicted and the current date, and can assign a weight to the second predicted sales amount in proportional to the prediction period. Accordingly, the processor (121) can output (identify) the (final) predicted sales amount (83) based on the first predicted sales amount and the second predicted sales amount to which the weights are assigned.

다른 실시예로, 프로세서(121)는, 판매량 예측 모델이, 제3 판매량 예측 모델을 더 포함하는 경우, 특정 상품에 관한 상품 정보를 제1 판매량 예측 모델, 제2 판매량 예측 모델 및 제3 판매량 예측 모델에 입력하여 출력된 제1 예측 판매량, 제2 예측 판매량 및 제3 예측 판매량에 가중치를 부여하고, 상기 가중치가 부여된 제1 내지 제3 예측 판매량을 기초로 (최종적인) 예측 판매량을 출력(식별)할 수 있다.In another embodiment, when the sales prediction model further includes a third sales prediction model, the processor (121) may input product information about a specific product into the first sales prediction model, the second sales prediction model, and the third sales prediction model, assign weights to the output first predicted sales, second predicted sales, and third predicted sales, and output (identify) a (final) predicted sales based on the first to third predicted sales to which the weights are assigned.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 상품 정보(21) 및/또는 브랜드 정보를 입력으로, 예측 판매량을 출력하도록 판매량 예측 모델을 학습시킬 수 있다. 즉, 프로세서(121)는, 특정 상품명에 관한 상품 정보와 브랜드 정보에 포함된, 재고 정보를 입력으로, 예측 판매량이 출력되도록, 제1 항목 정보에 포함된 항목들과 재고 정보 간 상관 관계(예를 들어, 항목 정보, 리뷰 정보에 포함된 기준 평점과 재고 정보 간의 관계) 및 시간과 판매량 간의 상관 관계를 학습시킬 수 있다.Meanwhile, the processor (121) according to one embodiment of the present disclosure can train a sales prediction model to output a predicted sales amount by inputting product information (21) and/or brand information. That is, the processor (121) can train a correlation between items included in the first item information and inventory information (for example, a relationship between a standard rating included in the item information and review information and inventory information) and a correlation between time and sales amount by inputting product information and inventory information included in the brand information regarding a specific product name, so that a predicted sales amount is output.

보다 상세하게, 프로세서(121)는, 재고 정보를 처리한 것에 기초하여, 일정 기간 동안 재고가 없는 경우, 그 기간 동안 일자별 재고가 있었으면 판매될 수 있는 수량을 기초로 실제 재고가 있었다면 판매되었을 손실된 매출을 일자별로 계산(산출)할 수 있다. 이 경우, 프로세서(121)는, 항목 정보와 리뷰 정보를 포함하는 상품 정보와 실제 재고가 있었다면 판매되었을 손실된 매출을 입력으로 예측 판매량이 출력되도록, 제1 항목 정보에 포함된 항목들과 재고 정보 간 상관 관계(예를 들어, 항목 정보, 리뷰 정보에 포함된 기준 평점과 재고 정보 간의 관계) 및 시간과 판매량 간의 상관 관계를 학습시킬 수 있다. 다만, 이에 한정되는 것은 아니다.한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 통합 그라디언트(integrated Gradient)를 통해 판매량 예측 모델에 입력되는 상품 정보에 포함된 데이터의 기여도를 식별할 수 있다. 즉, 프로세서(121)는, 통합 그라디언트를 통해 상품 정보에 포함된 제1 항목 정보의 각각의 데이터 값들이 상기 판매량 예측 모델에 어느정도 기여하는지 정도(기여도)를 식별할 수 있다.More specifically, the processor (121) may calculate (produce) the lost sales for each day that would have been sold if there had been actual inventory based on the quantity that could have been sold if there had been inventory for each day during the period, based on the inventory information processed, if there is no inventory for a certain period of time. In this case, the processor (121) may learn the correlation between the items included in the first item information and the inventory information (for example, the relationship between the standard rating included in the item information and the review information and the inventory information) and the correlation between time and the sales amount so that the predicted sales amount is output by inputting product information including item information and review information and the lost sales that would have been sold if there had been actual inventory. However, the present invention is not limited thereto. Meanwhile, the processor (121) according to one embodiment of the present disclosure may identify the contribution of data included in the product information input to the sales amount prediction model through an integrated gradient. That is, the processor (121) can identify the degree (contribution) to which each data value of the first item information included in the product information contributes to the sales volume prediction model through the integrated gradient.

여기에서, 통합 그라디언트는, 기계학습 모델에서 입력 특성의 기여도를 해석하기 위한 방법 중 하나로, 인공지능 모델의 예측에 어떤 입력이 출력에 대하여 얼마나 중요한 역할을 하는지 판단하기 위한 방법이다. 보다 상세하게, 통합 그라디언트는, 입력 데이터와 모델의 입력 구성을 결정하고, 베이스라인 값은 입력 데이터의 출발점으로 설정하는 모델 입력 데이터 설정 및 베이스라인 값 설정 단계, 베이스라인에서부터 실제 입력까지의 경로를 등분하여 중요 지점을 결정하는 베이스라인 값에서 입력 데이터까지의 경로를 등분 구간으로 나누는 단계, 각 구간의 중간 지점에서 모델의 그라디언트를 계산하는 단계 및 계산된 그라디언트를 통합하여 입력 데이터의 중요도를 나타내는 중요도 점수를 생성하는 단계를 포함할 수 있다.Here, the integrated gradient is one of the methods for interpreting the contribution of input features in a machine learning model, and is a method for determining how important a certain input plays a role in the prediction of an artificial intelligence model for the output. More specifically, the integrated gradient may include a step of setting model input data and a baseline value, which determine the input data and the input configuration of the model and set the baseline value as the starting point of the input data, a step of dividing the path from the baseline to the actual input into equal sections to determine important points, a step of calculating the gradient of the model at the midpoint of each section, and a step of generating an importance score indicating the importance of the input data by integrating the calculated gradient.

즉, 본 개시의 일 실시예에 의한 프로세서(121)는, 판매량 예측 모델에 대한 통합 그라디언트를 통해, 상품 정보에 포함된 제1 항목 정보 데이터들 중 예측 판매량에 대한 기여도(중요도)를 식별할 수 있다. 이에 따라서, 상품 정보 중 어떠한 데이터(리뷰수, 평점 등)에 따라 예측 판매량에 기여하는지 판단할 수 있다.That is, the processor (121) according to one embodiment of the present disclosure can identify the contribution (importance) of the first item information data included in the product information to the predicted sales amount through the integrated gradient for the sales amount prediction model. Accordingly, it is possible to determine which data (number of reviews, rating, etc.) among the product information contributes to the predicted sales amount.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 판매량 예측 모델의 통합 그라디언트를 통해 획득한 상품 정보에 포함된 제1 항목 정보 각각의 기여도를 획득하고, 상기 기여도가 미리 정해진 값보다 큰 항목과 상기 항목의 값을 사용자 단말(300)로 송신하도록 통신부(110)를 제어할 수 있다. 이에 따라, 사용자는 특정 상품 또는 특정 카테고리의 특정 요소(제1 항목 정보 중 어느 하나)가 판매량(매출)에 큰 영향을 미치는지 식별할 수 있다. 다만, 이에 한정되는 것은 아니다.Meanwhile, the processor (121) according to one embodiment of the present disclosure can control the communication unit (110) to obtain the contribution of each of the first item information included in the product information obtained through the integrated gradient of the sales volume prediction model, and transmit the item and the value of the item whose contribution is greater than a predetermined value to the user terminal (300). Accordingly, the user can identify whether a specific product or a specific element (any one of the first item information) of a specific category has a large influence on the sales volume (sales). However, the present invention is not limited thereto.

도 9는 일 실시예에 의한 전자 장치가 수집하는 리뷰 정보를 설명하기 위한 도면이다. 또한, 도 10은 일 실시예에 의한 전자 장치의 리뷰 정보의 대표 문장을 식별하는 것을 설명하기 위한 도면이다.FIG. 9 is a diagram for explaining review information collected by an electronic device according to one embodiment. In addition, FIG. 10 is a diagram for explaining identifying representative sentences of review information of an electronic device according to one embodiment.

도 6 및 도 9를 참조하면, 본 개시의 일 실시예에 의한 프로세서(121)는, 서버(200)로부터 획득한 웹페이지 데이터를 처리한 것에 기초하여, 적어도 하나의 상품명과 상기 상품명에 대응되는 적어도 하나의 리뷰 정보(51)를 포함하는 수집 정보를 식별할 수 있다. 리뷰 정보(51)는 예를 들어, 상품명에 대응되어 상기 상품명에 대한 작성자 ID 및 상기 ID에 대응되는 평점과 고객리뷰 텍스트, 제품 사진, 찜수, 좋아요수, 게시일, 카테고리, 판매 옵션 등를 포함하는 메타데이터일 수 있다. 다만, 이는 예시적인 것으로서, 다른 실시예로, 리뷰 정보(51)는 상품명에 대한 복수의 작성자 ID 및 상기 ID에 대응되는 평점과 리뷰를 포함할 수 있다. 다만, 설명의 편의상, 이하에서는, 도 9에 도시된 바와 같이, 프로세서(121)는 웹페이지 데이터를 처리한 것에 기초하여, 하나의 상품명 및 상기 상품명에 대응되는 하나의 리뷰 정보를 포함하는 수집 정보를 식별하는 것으로 가정하고 설명하도록 한다.Referring to FIGS. 6 and 9, the processor (121) according to one embodiment of the present disclosure may identify collection information including at least one product name and at least one review information (51) corresponding to the product name based on processing web page data acquired from the server (200). The review information (51) may be metadata including, for example, an author ID for the product name and a rating corresponding to the ID, customer review text, a product photo, the number of likes, the number of likes, the posting date, the category, the sales option, etc., corresponding to the product name. However, this is merely exemplary, and in another embodiment, the review information (51) may include a plurality of author IDs for the product name and ratings and reviews corresponding to the ID. However, for convenience of explanation, the following description will assume that the processor (121) identifies collection information including one product name and one review information corresponding to the product name based on processing web page data, as illustrated in FIG. 9.

본 개시의 일 실시예에 의한 프로세서(121)는, 수집 정보를 식별한 것에 응답하여, 상기 수집 정보를 처리한 것에 기초하여 리뷰 정보의 적어도 하나의 대표 문장을 식별할 수 있다. 예시적으로, 프로세서(121)는 리뷰 정보가 복수의 리뷰를 포함하는 경우, 각각의 리뷰에 대한 적어도 하나의 대표 문장을 식별할 수 있다.According to one embodiment of the present disclosure, the processor (121) may, in response to identifying the collection information, identify at least one representative sentence of the review information based on processing the collection information. For example, when the review information includes a plurality of reviews, the processor (121) may identify at least one representative sentence for each review.

보다 상세하게, 프로세서(121)는, 리뷰 정보를 n-gram을 통해 복수의 문장으로 분해하고, 상기 분해된 복수의 문장 각각과 상기 리뷰 정보 간의 유사도를 식별하여, 상기 복수의 문장 중 상기 식별된 유사도가 높은 미리 정해진 개수의 문장을 식별함으로써, 적어도 하나의 대표 문장을 식별하는 것일 수 있다.More specifically, the processor (121) may identify at least one representative sentence by decomposing the review information into a plurality of sentences through n-grams, identifying the similarity between each of the decomposed plurality of sentences and the review information, and identifying a predetermined number of sentences among the plurality of sentences having a high degree of identified similarity.

도 10을 참조하면, 일 실시예에 의한 n-gram은 텍스트를 n개의 연속된 단어나 문자로 나누는 방법으로서, n은 n-gram의 크기를 결정하는 매개 변수로서, 미리 정해진 값일 수 있다. 이에 따라, 일 실시예에 따른 프로세서(121)는 리뷰 정보를 처리한 것에 기초하여, n-gram을 통해 n개의 연속된 단어로 복수의 문장으로 분해할 수 있다.Referring to FIG. 10, an n-gram according to one embodiment is a method of dividing a text into n consecutive words or characters, where n is a parameter that determines the size of the n-gram and may be a predetermined value. Accordingly, the processor (121) according to one embodiment may decompose the review information into a plurality of sentences into n consecutive words through the n-gram based on the review information processed.

이후, 프로세서(121)는 분해된 복수의 문장 각각과 리뷰 정보(51) 전체의 리뷰 간의 유사도를 식별할 수 있다. 여기에서, 유사도를 식별하는 것은, 상술한 문장 임베딩(sentence embedding)을 통해 구현될 수 있다. 보다 구체적으로, 일 실시예에 의한 프로세서(121)는, 분해된 복수의 문장 각각과 리뷰 정보(51)의 리뷰에 대하여 문장 임베딩을 계산할 수 있다. 이에 따라, 프로세서(121)는 복수의 문장 각각과 리뷰 정보(51) 간의 코사인 유사도를 식별할 수 있다.Thereafter, the processor (121) can identify the similarity between each of the decomposed sentences and the reviews of the entire review information (51). Here, identifying the similarity can be implemented through the sentence embedding described above. More specifically, the processor (121) according to one embodiment can calculate sentence embedding for each of the decomposed sentences and the reviews of the review information (51). Accordingly, the processor (121) can identify the cosine similarity between each of the decomposed sentences and the review information (51).

예시적으로, 도 10에 도시된 바에 따르면, 프로세서(121)는, 리뷰 정보(51)를 n-gram을 통해 제1 문장 내지 제5 문장으로 분해할 수 있다. 이 경우, 프로세서(121)는 제1 내지 제5 문장 및 리뷰 정보(51)에 포함된 리뷰의 문장 임베딩을 계산할 수 있다. 이후 프로세서(121)는 제1 내지 제5 문장의 문장 임베딩과 리뷰 정보(51)의 문장 임베딩 간의 코사인 유사도(cosine simirarity)를 식별할 수 있다. 이에 따라, 프로세서(121)는, 분해된 제1 내지 제5 문장 각각과 상기 리뷰 정보(51) 간의 유사도를 식별하여, 상기 복수의 문장 중 상기 식별된 유사도가 높은 순서대로 미리 정해진 개수의 문장을 식별함으로써, 적어도 하나의 대표 문장을 식별할 수 있다. 도시된 바에 따르면, 상기 미리 정해진 개수는 3 개로 설정된 것을 이해할 수 있다. For example, as illustrated in FIG. 10, the processor (121) may decompose the review information (51) into the first to fifth sentences through n-grams. In this case, the processor (121) may calculate the sentence embeddings of the first to fifth sentences and the reviews included in the review information (51). Thereafter, the processor (121) may identify the cosine similarity between the sentence embeddings of the first to fifth sentences and the sentence embeddings of the review information (51). Accordingly, the processor (121) may identify the similarity between each of the decomposed first to fifth sentences and the review information (51), and identify a predetermined number of sentences in the order of high similarity among the plurality of sentences, thereby identifying at least one representative sentence. As illustrated, it can be understood that the predetermined number is set to three.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 식별된 적어도 하나의 대표 문장을 평점 추출 모델에 입력하여, 리뷰 정보의 평점을 추출(출력)할 수 있다. 즉, 리뷰와 상기 리뷰에 대응되는 평점은, 사람들의 다양한 감정과 그에 따른 주관적인 평점을 포함하므로, 일부 사람이 특정 영역(예를 들어, 상품의 질 또는 서비스의 질)을 중시하는 경우, 해당하는 사람은 특정 영역 외의 영역에 대한 만족도가 낮더라도 특정 영역을 높게 제공 받으면 높은 점수를 부여하는 경우가 발생한다. 즉, 이렇게 부여 받은 평점은 변별력이 없는 무의미한 평점이 될 수 있으므로, 이러한 문제를 해결하기 위하여 프로세서(121)는, 리뷰 정보(51)에 대한 평점을 새로 추출(출력)하는 것일 수 있다. 이렇게 새로 출력된 평점을, 기준 평점으로 명명하도록 한다. 즉, 기준 평점은, 평점 추출 모델에 리뷰 정보를 입력으로 출력된 평점을 의미할 수 있다.Meanwhile, the processor (121) according to one embodiment of the present disclosure may input at least one identified representative sentence into a rating extraction model to extract (output) a rating of review information. That is, since a review and a rating corresponding to the review include various emotions of people and subjective ratings accordingly, when some people value a specific area (e.g., quality of a product or quality of a service), even if the satisfaction level for an area other than the specific area is low, the person may give a high score if the specific area is given a high rating. That is, the rating given in this way may become a meaningless rating without discrimination, so in order to solve this problem, the processor (121) may newly extract (output) a rating for the review information (51). The rating newly output in this way is named a reference rating. That is, the reference rating may mean a rating output by inputting review information into the rating extraction model.

도 11은 일 실시예에 의한 전자 장치의 평점 추출 모델을 학습시키기 위한 학습 데이터를 설명하기 위한 도면이다. 또한, 도 12는 일 실시예에 의한 전자 장치의 평점 추출에 관한 실시예를 설명하기 위한 도면이다.FIG. 11 is a diagram for explaining learning data for learning a rating extraction model of an electronic device according to one embodiment. Also, FIG. 12 is a diagram for explaining an embodiment of rating extraction of an electronic device according to one embodiment.

도 11 내지 도 12를 참조하면, 본 개시의 일 실시예에 의한 프로세서(121)는, 복수의 리뷰 정보(51) 및 상기 리뷰 정보(51) 각각에 대응되는 리뷰 전체 감정(전체 감정), 상품에 대한 감정(제품 감정), 서비스에 대한 감정(서비스 감정), 리뷰 타입, 평점을 입력하여 평점 추출 모델(52)을 학습시킬 수 있다. 즉, 평점 추출 모델(52)은, 리뷰 정보(51) 및 상기 리뷰 정보에 대한 사람이 직접 평가를 통해 라벨링함으로써 획득된 로우 데이터(학습 데이터)를 기초로 학습되어, 리뷰에 관한 문장을 포함하는 리뷰 정보가 입력되는 경우, 상기 문장(리뷰)에 대응되는 전체 감정, 제품 감정, 서비스 감정, 리뷰 타입 또는 평점 중 적어도 하나를 출력할 수 있다. 한편, 전체 감정, 제품 감정 및 서비스 감정은 예를 들어, 긍정적, 부정적 또는 중립적인 감정 중 어느 하나에 대한 평가로 라벨링될 수 있다. 또한, 리뷰 타입의 경우, 리뷰의 길이에 따른 단순 리뷰와 진정성 리뷰 또는 중복적인 내용을 포함하는 인위적 리뷰 중 어느 하나에 대한 평가로 라벨링될 수 있다.Referring to FIGS. 11 and 12, a processor (121) according to an embodiment of the present disclosure may input a plurality of review information (51) and overall review emotion (overall emotion), emotion about a product (product emotion), emotion about a service (service emotion), review type, and rating corresponding to each of the review information (51) to train a rating extraction model (52). That is, the rating extraction model (52) is trained based on review information (51) and raw data (learning data) obtained by a person directly evaluating and labeling the review information, and when review information including a sentence about a review is input, it may output at least one of overall emotion, product emotion, service emotion, review type, or rating corresponding to the sentence (review). Meanwhile, overall emotion, product emotion, and service emotion may be labeled as an evaluation of any one of positive, negative, or neutral emotions, for example. Additionally, for review types, they can be labeled with an evaluation of either a simple review based on the length of the review, an authentic review, or an artificial review containing redundant content.

이 경우, 프로세서(121)는, 상술한 바와 같이, 리뷰 정보(51)를 포함하는 수집 정보를 처리 한 것에 기초하여 리뷰 정보의 적어도 하나의 대표 문장을 식별하고, 상기 적어도 하나의 대표 문장을 평점 추출 모델(52)에 입력하여, 전체 감정, 제품 감정, 서비스에 대한 감정, 리뷰 타입 또는 평점 중 적어도 하나를 출력할 수 있다.In this case, the processor (121) identifies at least one representative sentence of the review information based on processing the collected information including the review information (51) as described above, and inputs the at least one representative sentence into the rating extraction model (52) to output at least one of the overall sentiment, product sentiment, sentiment for the service, review type, or rating.

일 실시예에 의한 프로세서(121)는, 대표 문장이 예를 들어, 제1 문장 및 제2 문장을 포함하는 경우, 제1 문장을 평점 추출 모델(52)에 입력하여 출력된 제1 점수 및 상기 제2 문장을 평점 추출 모델에 입력하여 출력된 제2 점수를 기초로 리뷰 정보의 평점을 추출할 수 있다. 보다 상세하게, 프로세서(121)는, 제1 점수 및 제2 점수의 평균을 리뷰 점수의 평점으로 추출(출력)할 수 있다.In one embodiment, the processor (121) may extract a rating of review information based on a first score output by inputting the first sentence into the rating extraction model (52) and a second score output by inputting the second sentence into the rating extraction model, for example, if the representative sentence includes a first sentence and a second sentence. More specifically, the processor (121) may extract (output) an average of the first score and the second score as a rating of the review score.

예시적으로, 평점 추출 모델(52)은 복수의 평점 추출 모델을 포함한 앙상블 모델일 수 있다. 일 실시예에 의한 평점 추출 모델(52)은, 제1 모델 및 제2 모델을 포함할 수 있다. 보다 구체적으로, 평점 추출 모델은, 리뷰에 관한 문장을 포함하는 리뷰 정보(51)가 입력되면 상기 문장에 대응되는 평점이 출력되도록 리뷰에 관한 문장 및 상기 문장에 대응되는 평점을 포함하는 학습 데이터셋을 입력으로 학습된 제1 모델 또는 제2 모델 중 적어도 하나를 포함할 수 있다.For example, the rating extraction model (52) may be an ensemble model including a plurality of rating extraction models. The rating extraction model (52) according to one embodiment may include a first model and a second model. More specifically, the rating extraction model may include at least one of a first model or a second model trained with a learning data set including sentences about reviews and ratings corresponding to the sentences as input so that when review information (51) including sentences about reviews is input, a rating corresponding to the sentences is output.

제1 모델 및 제2 모델은 서로 다른 알고리즘을 기초로 학습된 인공지능 모델로서, 예시적으로, 제1 모델 또는 제2 모델은, KeyBERT 알고리즘, KoBERT 알고리즘, DeBERTa 알고리즘 또는 XLNet 알고리즘 중 하나일 수 있다.The first model and the second model are artificial intelligence models learned based on different algorithms. For example, the first model or the second model may be one of the KeyBERT algorithm, the KoBERT algorithm, the DeBERTa algorithm, or the XLNet algorithm.

상술한 예시에서, 프로세서(121)은 수집 정보를 처리한 것에 기초하여, 리뷰 정보를 식별하고, 상기 리뷰 정보를 처리한 것에 기초하여 식별된 적어도 하나의 대표 문장을 상기 제1 모델 및 제2 모델에 입력하여 각각 제1 출력 및 제2 출력을 획득(추출 또는 출력)할 수 있다. 이 경우, 프로세서(121)는, 제1 출력 및 제2 출력을 기초로 리뷰 정보의 평점을 추출할 수 있다.In the above-described example, the processor (121) can identify review information based on processing the collected information, and input at least one representative sentence identified based on processing the review information into the first model and the second model to obtain (extract or output) a first output and a second output, respectively. In this case, the processor (121) can extract a rating of the review information based on the first output and the second output.

보다 상세하게, 프로세서(121)는 적어도 하나의 대표 문장을 제1 모델에 입력하여 평점에 관한 제1 출력을 출력(획득)할 수 있다. 또한, 프로세서(121)는 적어도 하나의 대표 문장을 제2 모델에 입력하여 평점에 관한 제2 출력을 출력(획득)할 수 있다. 이에 따라, 일 실시예에 의한 프로세서(121)는 상기 제1 출력 및 상기 제2 출력의 평균값을 평점으로 식별(추출)할 수 있다.More specifically, the processor (121) can input at least one representative sentence into the first model to output (obtain) a first output regarding the rating. In addition, the processor (121) can input at least one representative sentence into the second model to output (obtain) a second output regarding the rating. Accordingly, the processor (121) according to one embodiment can identify (extract) an average value of the first output and the second output as the rating.

한편, 다른 실시예로, 프로세서(121)는, 적어도 하나의 대표 문장을 제1 모델, 제2 모델 및 상기 제1 모델 및 제2 모델과 상이한 알고리즘을 통해 학습된 제3 모델에 입력하여, 각기 상기 적어도 하나의 대표 문장에 대한 전체 감정, 제품 감정, 서비스 감정, 리뷰 타입 또는 평점 중 적어도 하나를 포함하는 제1 출력, 제2 출력 및 제3 출력을 획득할 수 있다. 이 경우, 프로세서(121)는, 전체 감정에 대응되는 제1 출력, 제2 출력 및 제3 출력에 기초하여, 최종 전체 감정을 추출(출력)할 수 있다. 이와 마찬가지로, 프로세서(121)는, 제품 감정, 서비스 감정, 리뷰 타입 또는 평점에 대하여, 상술한 제1 모델 내지 제3 모델에 기초하여 최종 제품 감정, 최종 서비스 감정, 최종 리뷰 타입 또는 최종 평점을 출력할 수 있다.Meanwhile, in another embodiment, the processor (121) may input at least one representative sentence into the first model, the second model, and the third model learned through an algorithm different from the first model and the second model, and obtain a first output, a second output, and a third output each including at least one of an overall sentiment, a product sentiment, a service sentiment, a review type, or a rating for the at least one representative sentence. In this case, the processor (121) may extract (output) the final overall sentiment based on the first output, the second output, and the third output corresponding to the overall sentiment. Similarly, the processor (121) may output the final product sentiment, the final service sentiment, the final review type, or the final rating based on the first to third models described above with respect to the product sentiment, the service sentiment, the review type, or the rating.

즉, 상술한 제1 출력 내지 제3 출력에 기초하여, 최종 전체 감정을 출력한다는 것은 예시적으로, 제1 출력, 제2 출력 및 제3 출력이 전체 감정에 대하여 각각 긍정, 긍정 및 부정인 것에 응답하여, 절대 다수인 감정으로 출력되어, 최종 전체 감정은 긍정으로 출력될 수 있다. 이와 마찬가지로, 제1 출력 내지 제3 출력의 절대 다수의 감정 또는 값(데이터)에 따라, 최종 제품 감정, 최종 서비스 감정 또는 최종 리뷰 타입이 추출될 수 있다. 다만, 이에 한정되는 것은 아니고, 평점의 경우는, 제1 출력 내지 제3 출력의 평균 값에 따라 최종 평점이 출력될 수 있다.That is, based on the first to third outputs described above, outputting the final overall sentiment means that, for example, in response to the first, second, and third outputs being positive, positive, and negative for the overall sentiment, the absolute majority sentiment is output, so that the final overall sentiment can be output as positive. Similarly, based on the absolute majority sentiment or value (data) of the first to third outputs, the final product sentiment, the final service sentiment, or the final review type can be extracted. However, the present invention is not limited thereto, and in the case of the rating, the final rating can be output according to the average value of the first to third outputs.

한편, 상술한 실시예에서, 절대 다수가 없는 것에 응답하여, 프로세서(121)는, 제1 출력의 결과를 최종 결과로 추출(출력)할 수 있다. 예시적으로, 대표 문장이 입력되어 전체 감정에 관한 제1 출력, 제2 출력 및 제3 출력의 이 각각 긍정, 중립 및 부정인 것에 응답하여, 프로세서(121)는 제1 모델의 출력인 제1 출력의 결과인 긍정을 상기 입력된 대표 문장의 전체 감정으로 추출(출력)할 수 있다.Meanwhile, in the above-described embodiment, in response to the absence of an absolute majority, the processor (121) may extract (output) the result of the first output as the final result. For example, in response to the input of a representative sentence and the first output, the second output, and the third output regarding the overall sentiment being positive, neutral, and negative, respectively, the processor (121) may extract (output) the positive result of the first output, which is the output of the first model, as the overall sentiment of the input representative sentence.

또 다른 실시예로, 프로세서(121)는, 대표 문장이 예를 들어, 제1 문장, 제2 문장 및 제3 문장을 포함하는 경우에 있어서, 제1 문장을 상술한 제1 모델, 제2 모델 및 상기 제1 모델 및 제2 모델과 상이한 알고리즘을 통해 학습된 제3 모델에 입력하여, 획득된 제1 출력 내지 제3 출력을 획득할 수 있다. 또한, 프로세서(121)는, 제2 문장을 상술한 제1 모델, 제2 모델 및 상기 제1 모델 및 제2 모델과 상이한 알고리즘을 통해 학습된 제3 모델에 입력하여, 획득된 제1 출력 내지 제3 출력을 획득할 수 있다. 또한, 프로세서(121)는, 제3 문장을 상술한 제1 모델, 제2 모델 및 상기 제1 모델 및 제2 모델과 상이한 알고리즘을 통해 학습된 제3 모델에 입력하여, 획득된 제1 출력 내지 제3 출력을 획득할 수 있다.In another embodiment, when the representative sentence includes, for example, a first sentence, a second sentence, and a third sentence, the processor (121) can input the first sentence into the first model, the second model, and the third model learned through an algorithm different from the first and second models described above, and obtain the first output to the third output obtained. In addition, the processor (121) can input the second sentence into the third model learned through an algorithm different from the first and second models described above, and obtain the first output to the third output obtained. In addition, the processor (121) can input the third sentence into the first model, the second model, and the third model learned through an algorithm different from the first and second models described above, and obtain the first output to the third output obtained.

이 경우, 프로세서(121)는, 제1 문장을 제1 모델 내지 제3 모델에 입력하여 획득한 제1 출력 내지 제3 출력에 기초하여, 최종 전체 감정, 최종 제품 감정, 최종 서비스 감정, 최종 리뷰 타입 또는 최종 평점을 추출(출력)할 수 있다. 또한, 프로세서(121)는, 제2 문장을 제1 모델 내지 제3 모델에 입력하여 획득한 제1 출력 내지 제3 출력에 기초하여, 최종 전체 감정, 최종 제품 감정, 최종 서비스 감정, 최종 리뷰 타입 또는 최종 평점을 추출(출력)할 수 있다. 즉, 프로세서(121)는 리뷰 정보에 포함된 리뷰의 식별된 적어도 하나의 대표문장 각각에 대하여, 제1 모델 내지 제3 모델에 입력하여 출력된 제1 출력 내지 제3 출력을 기초로 최종 전체 감정, 최종 제품 감정, 최종 서비스 감정, 최종 리뷰 타입 또는 최종 평점을 추출(출력할 수 있다.In this case, the processor (121) can extract (output) the final overall sentiment, the final product sentiment, the final service sentiment, the final review type, or the final rating based on the first output to the third output obtained by inputting the first sentence to the first model to the third model. In addition, the processor (121) can extract (output) the final overall sentiment, the final product sentiment, the final service sentiment, the final review type, or the final rating based on the first output to the third output obtained by inputting the second sentence to the first model to the third model. That is, the processor (121) can extract (output) the final overall sentiment, the final product sentiment, the final service sentiment, the final review type, or the final rating based on the first output to the third output obtained by inputting the first model to the third model for each of at least one identified representative sentence of the review included in the review information.

이에 따라, 프로세서(121)는, 각각의 대표 문장에 대한 최종 전체 감정, 최종 제품 감정, 최종 서비스 감정 또는 최종 리뷰 타입의 절대 다수의 감정 또는 데이터를 상기 리뷰 정보의 전체 감정, 제품 감정, 서비스 감정 또는 리뷰 타입으로 추출(출력)할 수 있다. 다만, 이에 한정되는 것은 아니고, 평점의 경우는, 각각의 대표 문장에 대한 최종 평점의 평균 값에 따라 평점이 출력될 수 있다.Accordingly, the processor (121) can extract (output) the absolute majority of emotions or data of the final overall emotion, final product emotion, final service emotion, or final review type for each representative sentence as the overall emotion, product emotion, service emotion, or review type of the review information. However, it is not limited thereto, and in the case of the rating, the rating can be output according to the average value of the final rating for each representative sentence.

한편, 상술한 평점 추출 모델(52)은, KeyBERT 알고리즘, KoBERT 알고리즘, DeBERTa 알고리즘 또는 XLNet 알고리즘 외에도 기공지된 자연어 처리 알고리즘 또는 향후 개발될 자연어 처리 알고리즘이 적용될 수 있음을 이해할 수 있으며, 제1 모델 내지 제3 모델 외에도 복수 개의 모델이 적용되어 각 모델의 출력을 기초로 리뷰 정보의 전체 감정, 제품 감정, 서비스 감정, 리뷰 타입 또는 평점을 출력할 수 있음을 이해할 수 있다.Meanwhile, it can be understood that the above-described rating extraction model (52) can be applied with a known natural language processing algorithm or a natural language processing algorithm to be developed in the future in addition to the KeyBERT algorithm, the KoBERT algorithm, the DeBERTa algorithm, or the XLNet algorithm, and it can be understood that a plurality of models can be applied in addition to the first to third models to output the overall sentiment, product sentiment, service sentiment, review type, or rating of the review information based on the output of each model.

본 개시의 일 실시예에 의한 프로세서(121)는 상술한 실시예서 설명한 바와 같이, 수집 정보에 포함된 상품명, 상기 상품명에 대응되는 적어도 하나의 리뷰 정보를 식별하고, 상기 리뷰 정보를 처리한 것에 기초하여, 상기 리뷰 정보의 평점을 추출할 수 있다. 즉, 프로세서(121)는, 리뷰 정보에 포함된 리뷰에 관한 텍스트를 기초로 리뷰 정보에 포함된 고객들의 평점과 별개로 객관화된 기준 평점을 추출할 수 있다.According to one embodiment of the present disclosure, the processor (121) can identify a product name included in the collection information, at least one piece of review information corresponding to the product name, as described in the above-described embodiment, and extract a rating of the review information based on the processed review information. That is, the processor (121) can extract an objective standard rating separately from the ratings of customers included in the review information based on the text regarding the review included in the review information.

보다 상세하게, 프로세서(121)는, 웹페이지 데이터를 처리한 것에 기초하여, 식별된 수집 정보에 포함된, 상품명에 대응되어 상기 상품명에 대한 작성자 ID 및 상기 ID에 대응되는 평점과 고객리뷰 텍스트, 제품 사진, 찜수, 좋아요수, 게시일, 카테고리, 판매 옵션 등을 포함하는 리뷰 정보를, 평점 추출 모델에 입력하여, 기준 평점을 추출하고, 상기 리뷰 정보에 기준 평점 항목을 추가함으로써, 상기 리뷰 정보에 대한 기준 평점을 업데이트 할 수 있다. 즉, 프로세서(121)는, 수집 정보에 포함된 리뷰 정보를 처리한 것에 기초하여, 리뷰 정보에 기준 평점 항목을 추가함으로써, 리뷰 정보를 업데이트할 수 있다.본 개시의 일 실시예에 의한 프로세서(121)는 웹페이지 데이터를 처리한 것에 기초하여, 획득된 상품명과 상기 상품명에 대응되는 제1 항목 정보를 포함하는 수집 정보에서, 제1 항목 정보는, 리뷰 정보를 포함할 수 있다. 다만, 여기 리뷰 정보에서는, 고객들이 부여한 평점 정보를 포함할 뿐, 평점 추출 모델을 통해 획득된 기준 평점은 포함되어 있지 않다. 이 경우, 프로세서(121)는, 기저장된 상품 정보를 업데이트하는 경우, 예를 들어, 특정 상품명에 관한 리뷰 정보를 입력으로 획득한 기준 평점을 상기 특정 상품명에 관한 제1 항목 정보에 추가적인 항목인 기준 평점 항목을 신설하여 메타 데이터 형식으로 업데이트할 수 있다.In more detail, the processor (121) inputs review information including an author ID for a product name and a rating corresponding to the ID, a customer review text, a product photo, the number of likes, the number of likes, the posting date, the category, the sales options, etc., included in the identified collection information, into a rating extraction model to extract a reference rating, and by adding a reference rating item to the review information, the reference rating for the review information can be updated. That is, the processor (121) can update the review information by adding a reference rating item to the review information, based on the processing of the review information included in the collection information. According to one embodiment of the present disclosure, in the collection information including an acquired product name and first item information corresponding to the product name, the first item information can include review information. However, the review information herein only includes rating information given by customers, and does not include a reference rating acquired through the rating extraction model. In this case, when updating pre-stored product information, the processor (121) may, for example, update a standard score obtained by inputting review information regarding a specific product name by newly creating a standard score item as an additional item to the first item information regarding the specific product name in the form of metadata.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 특정 브랜드에 관한 적어도 하나의 상품명에 대한 상품 정보를 상술한 판매량 예측 모델에 입력하여 출력된 예측 판매량, 상기 적어도 하나의 상품명에 대응되는 리뷰 정보 및 메모리(122)에 기저장되거나 통신부(110)를 통해 획득한 브랜드 정보를 브랜드 가치 평가 모델에 입력하여, 브랜드의 적어도 하나의 가치 지표를 출력할 수 있다. 보다 상세하게, 프로세서(121)는, 특정 브랜드에 관한 적어도 하나의 상품명에 대한 상품 정보를 상술한 판매량 예측 모델에 입력하여 출력된 예측 판매량, 상기 적어도 하나의 상품명에 대응되는 리뷰 정보를 상술한 평점 추출 모델에 입력하여 출력된 전체 감정, 제품 감정, 서비스 감정, 리뷰 타입 또는 기준 평점 및 브랜드 정보를 브랜드 가치 평가 모델에 입력하여, 브랜드의 적어도 하나의 가치 지표를 출력할 수 있다.Meanwhile, the processor (121) according to one embodiment of the present disclosure may input product information about at least one product name of a specific brand into the aforementioned sales prediction model, input the output predicted sales, review information corresponding to the at least one product name, and brand information pre-stored in the memory (122) or acquired through the communication unit (110) into the brand value evaluation model, thereby outputting at least one value index of the brand. In more detail, the processor (121) may input product information about at least one product name of a specific brand into the aforementioned sales prediction model, input the output predicted sales, input review information corresponding to the at least one product name into the aforementioned rating extraction model, and input the output overall sentiment, product sentiment, service sentiment, review type or standard rating, and brand information into the brand value evaluation model, thereby outputting at least one value index of the brand.

여기에서, 브랜드 정보는 예를 들어, 거래 기간을 포함하는 거래 정보, 주간 평균 매출액, 주간 매출액 변동성과 년간 매출액 대비 주간 매출액 비율 변동성을 포함하는 매출액 정보, 판매채널수와 판매채널 별 매출액 분포도를 포함하는 판매 채널 정보, 일별 거래 고객수, 충성 고객수와 신규 고객수를 포함하는 고객 정보, 반품율과 클레임(claim) 수를 포함하는 평판 정보, 매출원가율, 광고비용율,물류비용율, 서비스비용율, 매출액 대비 재고금액과 매출채권 회수 리드 타임을 포함하는 재무 정보, 고객 평점, 총 고객 리뷰수, 고객 리뷰 타입 별 개수, 리뷰 별 전체 감정, SNS 리뷰 수, SNS 리뷰 타입 별 개수, SNS 리뷰 별 전체 감정과 해당 카테고리 판매자 순위를 포함하는 eWOM 정보, Lost GMV, 품절율, 재고일수, 회전율과 주문 납품율을 포함하는 재고 정보, SKU 수, SKU 당 매출, 신규 SKU 출시 수, SKU 라이프 사이클, SKU 클릭 수, SKU 상세페이지 체류시간, SKU 별 주문 취소율과 SKU 별 배송 리드 타임을 포함하는 제품 정보,주간 평균 가격, 주간 평균 가격 변동성과 프로모션 행사를 포함하는 가격 정보 및 계절성을 포함하는 기타 정보 중 적어도 하나를 포함할 수 있다.Here, brand information includes, for example, transaction information including a transaction period, sales information including weekly average sales, weekly sales volatility and weekly sales ratio volatility compared to annual sales, sales channel information including the number of sales channels and sales distribution by sales channel, customer information including the number of daily transaction customers, the number of loyal customers and new customers, reputation information including return rate and number of claims, financial information including cost of sales rate, advertising cost rate, logistics cost rate, service cost rate, inventory amount compared to sales and accounts receivable collection lead time, customer rating, total number of customer reviews, number of customer reviews by type, overall sentiment by review, number of SNS reviews, number of SNS reviews by type, overall sentiment by SNS review and eWOM information including seller ranking in the corresponding category, inventory information including lost GMV, out-of-stock rate, number of days in stock, turnover rate and order delivery rate, product information including number of SKUs, sales per SKU, number of new SKU launches, SKU life cycle, number of SKU clicks, dwell time on SKU detail page, order cancellation rate by SKU and delivery lead time by SKU, weekly average price, weekly average price It may include at least one of price information including volatility and promotional events and other information including seasonality.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 특정 브랜드의 적어도 하나의 상품명에 관한 예측 판매량, 특정 브랜드의 적어도 하나의 상품명에 대응하는 리뷰 정보의 전체 감정, 제품 감정, 서비스 감정, 리뷰 타입 또는 평점 및 브랜드 정보와 상기 특정 브랜드에 대한 사람이 직접 평가를 통해 라벨링함으로써 획득된 학습 데이터를 기초로 브랜드 가치 평가 모델을 학습시킬 수 있다. 한편, 상기 학습 데이터는 예를 들어, 리뷰 지표(index) 점수, 평점 지표 점수, 순위 지표 점수 및 수익 지표 점수를 포함할 수 있다. 즉, 브랜드 가치 평가 모델은, 상기 예측 판매량, 상기 리뷰 정보 및 브랜드 정보를 입력으로 리뷰 지표(index) 점수, 평점 지표 점수, 순위 지표 점수 및 수익 지표 점수를 출력하도록 학습된 인공지능 모델일 수 있다. 한편, 상술한 지표 점수는 각각 0 내지 10에 대응되는 점수로 라벨링될 수 있다.Meanwhile, the processor (121) according to one embodiment of the present disclosure may train a brand value evaluation model based on learning data obtained by labeling predicted sales volume, the entire sentiment of review information corresponding to at least one product name of a specific brand, product sentiment, service sentiment, review type or rating, and brand information and a person directly evaluating and labeling the specific brand. Meanwhile, the learning data may include, for example, a review index score, a rating index score, a ranking index score, and a profit index score. That is, the brand value evaluation model may be an artificial intelligence model trained to output a review index score, a rating index score, a ranking index score, and a profit index score by inputting the predicted sales volume, the review information, and the brand information. Meanwhile, the above-described index scores may be labeled with scores corresponding to 0 to 10, respectively.

한편, 본 개시의 일 실시예에 의한 프로세서(121)는, 브랜드 가치 평가 모델에 대한 통합 그라디언트를 통해, 특정 브랜드의 적어도 하나의 상품명에 관한 예측 판매량, 특정 브랜드의 적어도 하나의 상품명에 대응하는 리뷰 정보의 전체 감정, 제품 감정, 서비스 감정, 리뷰 타입 또는 평점 및 브랜드 정보 중 각각의 지표(리뷰 지표, 평점 지표, 순위 지표, 수익 지표)에 대한 기여도(중요도)를 식별할 수 있다. 이에 따라서, 사용자는 어떠한 개선점에 따라 브랜드의 가치가 올라갈 수 있는지 식별할 수 있다. 다만, 이에 한정되는 것은 아니다.Meanwhile, the processor (121) according to one embodiment of the present disclosure can identify, through an integrated gradient for a brand value evaluation model, a predicted sales volume for at least one product name of a specific brand, an overall sentiment of review information corresponding to at least one product name of a specific brand, a product sentiment, a service sentiment, a review type or rating, and a contribution (importance) for each indicator (review indicator, rating indicator, ranking indicator, profit indicator) among brand information. Accordingly, a user can identify which improvement can increase the value of the brand. However, the present invention is not limited thereto.

다르 실시예로, 프로세서(121)는, 브랜드 가치 평가 모델의 통합 그라디언트를 통해 입력 정보 각각의 기여도를 획득하고, 상기 기여도가 미리 정해진 값보다 큰 항목과 상기 항목의 값을 사용자 단말(300)로 송신하도록 통신부(110)를 제어할 수 있다.In another embodiment, the processor (121) may control the communication unit (110) to obtain the contribution of each input piece of information through the integrated gradient of the brand value evaluation model and transmit items with a contribution greater than a predetermined value and the values of the items to the user terminal (300).

메모리(122)는 전술한 동작 및 후술하는 동작을 수행하는 프로그램을 저장할 수 있으며, 프로세서(121)는 저장된 프로그램을 실행시킬 수 있다. 메모리(562)와 프로세서(121)가 복수인 경우에, 이들이 하나의 칩에 집적되는 것도 가능하고, 물리적으로 분리된 위치에 마련되는 것도 가능하다. 메모리(562)는 데이터를 일시적으로 기억하기 위한 S램(Static Random Access Memory, S-RAM), D램(Dynamic Random Access Memory) 등의 휘발성 메모리를 포함할 수 있다. 또한, 메모리(562)는 제어 프로그램 및 제어 데이터를 장기간 저장하기 위한 롬(Read Only Memory), 이피롬(Erasable Programmable Read Only Memory: EPROM), 이이피롬(Electrically Erasable Programmable Read Only Memory: EEPROM) 등의 비휘발성 메모리를 포함할 수 있다. 프로세서(121)는 각종 논리 회로와 연산 회로를 포함할 수 있으며, 메모리(562)로부터 제공된 프로그램에 따라 데이터를 처리하고, 처리 결과에 따라 제어 신호를 생성할 수 있다.The memory (122) can store a program that performs the operations described above and the operations described below, and the processor (121) can execute the stored program. In the case where there are a plurality of memories (562) and processors (121), they can be integrated into one chip or provided in physically separate locations. The memory (562) can include volatile memory such as S-RAM (Static Random Access Memory) and D-RAM (Dynamic Random Access Memory) for temporarily storing data. In addition, the memory (562) can include nonvolatile memory such as ROM (Read Only Memory), EPROM (Erasable Programmable Read Only Memory), and EEPROM (Electrically Erasable Programmable Read Only Memory) for long-term storing a control program and control data. The processor (121) can include various logic circuits and operation circuits, process data according to a program provided from the memory (562), and generate a control signal according to the processing result.

본 개시의 일 실시예에 의한 메모리(122)는, 상술한 상품 정보, 수집 정보, 리뷰 정보 및 브랜드 정보를 저장할 수 있다.The memory (122) according to one embodiment of the present disclosure can store the product information, collection information, review information, and brand information described above.

도 13 내지 도 14는 일 실시예에 의한 학습용 데이터 전처리 방법을 설명하기 위한 순서도이다. 또한, 도 15는, 일 실시예에 의한 리뷰 분석 방법을 설명하기 위한 순서도이다. 도 16 내지 도 17은 일 실시예에 의한 브랜드 가치 평가 방법을 설명하기 위한 순서도이다.Figures 13 and 14 are flowcharts for explaining a learning data preprocessing method according to one embodiment. Also, Figure 15 is a flowchart for explaining a review analysis method according to one embodiment. Figures 16 and 17 are flowcharts for explaining a brand value evaluation method according to one embodiment.

도 13 내지 도 17에 도시된 방법은 앞서 설명된 본 시스템(1000) 또는 본 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도, 시스템(1000) 및 본 장치(100)에 대하여 설명된 내용은 이하 설명하는 방법에도 동일하게 적용될 수 있다.The methods illustrated in FIGS. 13 to 17 can be performed by the system (1000) or the device (100) described above. Therefore, even if the contents are omitted below, the contents described for the system (1000) and the device (100) can be equally applied to the methods described below.

도 13을 참조하면, 본 장치(100)는 웹페이지 데이터를 수신할 수 있다(S110).Referring to FIG. 13, the device (100) can receive webpage data (S110).

또한, 본 장치(100)는 웹페이지 데이터를 처리한 것에 기초하여 수집 정보를 식별할 수 있다(S120).Additionally, the device (100) can identify collected information based on processing web page data (S120).

또한, 본 장치(100)는 상품 정보 및 수집 정보를 처리한 것에 기초하여, 상품 정보 및 수집 정보 간의 동일 상품 여부를 판단할 수 있다(S130).In addition, the device (100) can determine whether the product information and the collection information are the same product based on the product information and the collection information processed (S130).

또한, 본 장치(100)는 동일 상품으로 판단된 것에 응답하여, 상품 정보를 업데이트할 수 있다(S140).Additionally, the device (100) can update product information in response to being determined to be the same product (S140).

도 14를 참조하면, 본 장치(100)는 상품 정보 및 수집 정보를 처리한 것에 기초하여, 상품 정보 및 수집 정보에 포함된 상품명 각각의 문장 임베딩(Sentence embedding)을 계산할 수 있다(S210).Referring to FIG. 14, the device (100) can calculate sentence embeddings of each product name included in the product information and collection information based on processing the product information and collection information (S210).

또한, 본 장치(100)는 각각의 문장 임베딩 간의 코사인 유사도(cosine similarity)를 식별할 수 있다(S220).Additionally, the device (100) can identify cosine similarity between each sentence embedding (S220).

또한, 본 장치(100)는 식별된 코사인 유사도가 미리 정해진 값보다 큰지 여부를 판단할 수 있다(S230).Additionally, the device (100) can determine whether the identified cosine similarity is greater than a predetermined value (S230).

이 경우, 본 장치(100)는, 식별된 코사인 유사도가 미리 정해진 값보다 큰 것에 응답하여, 동일한 상품으로 판단할 수 있다(S240).In this case, the device (100) can determine that the product is the same in response to the identified cosine similarity being greater than a predetermined value (S240).

한편, 본 장치(100)는, 식별된 코사인 유사도가 미리 정해진 값보다 작은 것에 응답하여, 다른 상품으로 판단할 수 있다(S250).Meanwhile, the device (100) can determine that the product is a different product in response to the identified cosine similarity being smaller than a predetermined value (S250).

도 15를 참조하면, 본 장치(100)는 웹페이지 데이터를 수신할 수 있다(S310).Referring to FIG. 15, the device (100) can receive webpage data (S310).

또한, 본 장치(100)는 웹페이지 데이터를 처리한 것에 기초하여, 리뷰 정보를 포함하는 수집 정보를 식별할 수 있다(S320).Additionally, the device (100) can identify collection information including review information based on processing web page data (S320).

또한, 본 장치(100)는 수집 정보를 처리한 것에 기초하여 리뷰 정보의 적어도 하나의 대표 문장을 식별할 수 있다(S330).Additionally, the device (100) can identify at least one representative sentence of the review information based on the processed collected information (S330).

또한, 본 장치(100)는 식별된 적어도 하나의 대표 문장을 평점 추출 모델에 입력하여 리뷰 정보의 평점을 추출할 수 있다(S340).In addition, the device (100) can extract a rating of review information by inputting at least one identified representative sentence into a rating extraction model (S340).

도 16을 참조하면, 본 장치(100)는 상품 정보, 리뷰 정보 및 브랜드 정보를 저장할 수 있다(S410).Referring to FIG. 16, the device (100) can store product information, review information, and brand information (S410).

또한, 본 장치(100)는 상품 정보를 처리한 것에 기초하여, 상기 상품 정보를 판매량 예측 모델에 입력하여 예측 판매량을 출력할 수 있다(S420).In addition, the device (100) can input the product information into a sales volume prediction model and output a predicted sales volume based on the processed product information (S420).

또한, 본 장치(100)는 예측 판매량, 리뷰 정보 및 브랜드 정보를 브랜드 가치 평가 모델에 입력하여 브랜드의 적어도 하나의 가치 지표를 출력할 수 있다(S430).In addition, the device (100) can input predicted sales volume, review information, and brand information into a brand value evaluation model to output at least one value indicator of the brand (S430).

도 17을 참조하면, 본 장치(100)는 상품 정보를 저장할 수 있다(S510).Referring to FIG. 17, the device (100) can store product information (S510).

또한, 본 장치(100)는 상품 정보를 제1 판매량 예측 모델에 입력하여 제1 예측 판매량을 출력할 수 있다(S520).In addition, the device (100) can input product information into the first sales prediction model and output the first predicted sales amount (S520).

이 경우, 본 장치(100)는 예측 기간에 반비례하여 제1 예측 판매량에 가중치를 부여할 수 있다(S530).In this case, the device (100) can assign a weight to the first predicted sales amount inversely proportional to the prediction period (S530).

한편, 본 장치(100)는 상품 정보를 제2 판매량 예측 모델에 입력하여 제2 예측 판매량을 출력할 수 있다(S540).Meanwhile, the device (100) can input product information into a second sales prediction model and output a second predicted sales amount (S540).

이 경우, 본 장치(100)는 예측 기간에 비례하여 제2 예측 판매량에 가중치를 부여할 수 있다(S550).In this case, the device (100) can assign a weight to the second predicted sales amount in proportion to the prediction period (S550).

또한, 본 장치(100)는 가중치가 부여된 제1 예측 판매량 및 제2 예측 판매량을 기초로 예측 판매량을 출력할 수 있다(S560).In addition, the device (100) can output a predicted sales amount based on the weighted first predicted sales amount and the second predicted sales amount (S560).

한편, 개시된 실시예들은 컴퓨터에 의해 실행 가능한 명령어를 저장하는 기록매체의 형태로 구현될 수 있다. 명령어는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 프로그램 모듈을 생성하여 개시된 실시예들의 동작을 수행할 수 있다. 기록매체는 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium storing instructions executable by a computer. The instructions may be stored in the form of program codes, and when executed by a processor, may generate program modules to perform the operations of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터에 의하여 해독될 수 있는 명령어가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래시 메모리, 광 데이터 저장장치 등이 있을 수 있다. Computer-readable storage media include all types of storage media that store instructions that can be deciphered by a computer. Examples include ROM (Read Only Memory), RAM (Random Access Memory), magnetic tape, magnetic disk, flash memory, and optical data storage devices.

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 발명이 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.As described above, the disclosed embodiments have been described with reference to the attached drawings. Those skilled in the art to which the present invention pertains will understand that the present invention can be implemented in forms other than the disclosed embodiments without changing the technical idea or essential features of the present invention. The disclosed embodiments are exemplary and should not be construed as limiting.

1000: 본 시스템
100: 전자 장치
110: 통신부
120: 제어부
200: 서버 300: 사용자 단말1000: This system
100: Electronic devices
110: Communications Department
120: Control Unit
200: Server 300: User Terminal

Claims

In brand value assessment electronic devices,
Memory for storing product information and brand information; and
A processor for processing the above product information and the above brand information; including:
The above processor,
Based on the processing of the item information and review information included in the above product information, the above product information is input into a sales volume prediction model to output the predicted sales volume.
Inputting the above predicted sales volume and the above brand information into a brand value evaluation model to output at least one value indicator of the brand.
Electronic devices.

In the first paragraph,
The above processor,
Input the above product information into the first sales prediction model to output the first predicted sales volume,
Input the above product information into the second sales prediction model to output the second predicted sales volume,
Based on the first predicted sales amount and the second predicted sales amount, outputting the predicted sales amount.
Electronic devices.

In the second paragraph,
The above first sales prediction model is a model learned based on the TCN (Temporal Convolutional Network) algorithm, and the above second sales prediction model is a model learned based on the LSTM (Long Short Term Memory) algorithm.
Electronic devices.

In the third paragraph,
The above predicted sales volume is,
This is time series data on predicted sales by predicted date,
Outputting the above predicted sales volume is:
A method of assigning weights to the first predicted sales amount and the second predicted sales amount, and outputting the predicted sales amount based on the first predicted sales amount and the second predicted sales amount to which the weights are assigned.
Electronic devices.

In paragraph 4,
The above processor,
The weight of the first predicted sales amount is given inversely to the prediction period, and the weight of the second predicted sales amount is given in proportion to the prediction period.
Electronic devices.

In the first paragraph,
The above sales prediction model is,
Learned based on multiple product names corresponding to the category and the first item information corresponding to each of the product names
Electronic devices.

In the first paragraph,
The above sales prediction model is,
Learned by including an embedding layer,
Electronic devices.

In the first paragraph,
The above processor,
Identify the contribution of data included in the product information input to the sales prediction model through the integrated gradient,
Transmitting data having a contribution greater than a predetermined value to the user terminal;
Electronic devices.

In the first paragraph,
The above brand value evaluation model is,
Trained to output review index scores, rating index scores, ranking index scores, and revenue index scores by inputting the above predicted sales volume, the above review information, and the above brand information.
Electronic devices.

In terms of brand value evaluation methods,
Step for storing product information and brand information;
A step of inputting the product information into a sales volume prediction model and outputting the predicted sales volume based on processing the item information and review information included in the product information; and
A step of inputting the predicted sales volume and the brand information into a brand value evaluation model to output at least one value indicator of the brand;
method.

In accordance with Article 10,
Outputting the above predicted sales volume is:
Input the above product information into the first sales prediction model to output the first predicted sales volume,
Input the above product information into the second sales prediction model to output the second predicted sales volume,
Based on the first predicted sales amount and the second predicted sales amount, outputting the predicted sales amount.
method.

In Article 11,
The above first sales prediction model is a model learned based on the TCN (Temporal Convolutional Network) algorithm, and the above second sales prediction model is a model learned based on the LSTM (Long Short Term Memory) algorithm.
method.

In Article 12,
The above predicted sales volume is,
This is time series data on predicted sales by predicted date,
Outputting the above predicted sales volume is:
A method of assigning weights to the first predicted sales amount and the second predicted sales amount, and outputting the predicted sales amount based on the first predicted sales amount and the second predicted sales amount to which the weights are assigned.
method.

In Article 13,
The above method,
A step of further including: assigning a weight to the first predicted sales amount inversely proportional to the prediction period, and assigning a weight to the second predicted sales amount in proportional to the prediction period;
method.

In Article 10,
The above sales prediction model is,
Learned based on multiple product names corresponding to a category and first item information corresponding to each of the product names.
method.

In Article 10,
The above sales prediction model is,
Learned by including an embedding layer,
method.

In Article 10,
The above method,
A step for identifying the contribution of data included in the product information input to the sales prediction model through an integrated gradient; and
Further comprising a step of transmitting data having a contribution greater than a predetermined value to the user terminal;
method.

In Article 10,
The above brand value evaluation model is,
Trained to output review index scores, rating index scores, ranking index scores, and revenue index scores by inputting the above predicted sales volume, the above review information, and the above brand information.
method.

A computer-readable recording medium having recorded thereon a program capable of executing the big data-based intelligent inflow sewage treatment method of Article 10.