KR102759478B1

KR102759478B1 - Job search matching method and system

Info

Publication number: KR102759478B1
Application number: KR1020220099645A
Authority: KR
Inventors: 두일철; 오세종; 김태준; 심봉걸; 정동현
Original assignee: 한국외국어대학교 연구산학협력단
Priority date: 2022-08-10
Filing date: 2022-08-10
Publication date: 2025-01-22
Anticipated expiration: 2042-08-10
Also published as: KR20240021387A

Abstract

본 발명은 구인구직 사이트의 구인공고에 기반해 사용자에게 양질의 기업 추천 서비스를 제공할 수 있는 매칭 방법 및 시스템에 관한 것으로, 사용자의 선택을 기준으로 사용자의 요구 조건에 맞는 구인 공고를 매칭해 제공할 수 있는 방법 및 시스템에 관한 것으로, 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 단계; 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 단계; 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 단계; 및 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 단계;를 포함하는 구성을 개시한다.The present invention relates to a matching method and system capable of providing a high-quality corporate recommendation service to a user based on a job posting on a job search site, and relates to a method and system capable of matching and providing a job posting that meets a user's requirements based on a user's selection, the method including: a step of receiving at least one desired corporate condition and user specification from the user; a step of learning and calculating a correlation between companies satisfying the corporate condition and corporate preference according to at least one of the user specifications; a step of determining a ranking of companies satisfying the corporate condition according to the correlation; and a step of presenting a list of companies to the user according to the corporate ranking according to the correlation.

Description

{JOB SEARCH MATCHING METHOD AND SYSTEM}

본 발명은 구인구직 사이트의 구인공고에 기반해 사용자에게 양질의 기업 추천 서비스를 제공할 수 있는 매칭 방법 및 시스템에 관한 것으로, 사용자의 선택을 기준으로 사용자의 요구 조건에 맞는 구인 공고를 매칭해 제공할 수 있는 방법 및 시스템에 관한 것이다.The present invention relates to a matching method and system capable of providing a quality company recommendation service to a user based on a job posting on a job search site, and more particularly, to a method and system capable of matching and providing a job posting that meets the user's requirements based on the user's selection.

옛날에는 구인 구직 활동을 위해 신문의 구인 구직란을 통해 정보를 얻거나 간행물을 통해서 취업활동을 위한 정보를 얻을 수 있었지만 오늘날에는 인터넷 또는 앱 서비스를 통해 다양한 기업의 구인 정보를 확인하고 자신의 지원 자격과 자신이 원하는 조건을 만족하는 기업을 선택해 지원할 수 있다.In the past, you could get information about job hunting through the job posting section of a newspaper or through publications, but today, you can check the job postings of various companies through the Internet or app services, and choose and apply to companies that meet your qualifications and desired conditions.

기존 구인구직 사이트들의 기업 추천은 카테고리별 기업리스트, 리뷰, 복지 및 급여 순으로 이루어 진다. 따라서, 사용자는 자신이 지원하고자 하는 기업을 선택하기 위해서 카테고리별 기업리스트, 리뷰, 복지 및 급여 등의 정보를 복합적으로 확인하여 선택하게 되는데, 수많은 기업들 중에서 지원할 기업을 선택하기 위해서는 각 기업의 정보들을 일일이 확인하여 자신이 원하는 조건에 맞는지 확인하는 작업을 해야 하기 때문에 상당한 시간을 소모하게 된다.Existing job search sites recommend companies in the following order: company list by category, reviews, benefits, and salary. Therefore, users have to check and select information such as company list by category, reviews, benefits, and salary in order to select the company they want to apply to. However, in order to select a company among numerous companies, they have to check the information of each company one by one to see if it meets their desired conditions, which consumes a considerable amount of time.

따라서, 사용자가 원하는 어떤 구인공고에 대해, 기계학습을 통해 유사한 기업들을 추천해 준다면, 사용자가 직접 여러 조건을 시스템에 입력하는 것보다 간편하게 이용할 수 있을 것이다. 최대한 간결하고 직관적인 이용방법을 가지면서도, 사용자에게 만족스러운 추천 결과를 제공하는 시스템이 필요한 실정이다.Therefore, if machine learning is used to recommend similar companies for any job posting that a user wants, it will be easier for the user to use than having to input multiple conditions directly into the system. A system that provides satisfactory recommendation results to the user while having the most concise and intuitive usage method is needed.

따라서, 본 발명은 상기한 바와 같은 문제점을 해결하기 위한 것으로서, 사용자의 성향에 맞는 기업을 리스팅하여 추천할 수 있는 방법 및 시스템을 제공하고자 한다.Accordingly, the present invention is intended to solve the above-mentioned problems and to provide a method and system capable of listing and recommending companies that suit the user's preferences.

본 발명은 사용자의 목적 달성에 영향을 미치는 속성을 찾아내 연산량을 최소화하여 시스템의 부담을 줄이면서도 정확도를 높일 수 있는 방법 및 시스템을 제공하고자 한다.The present invention seeks to provide a method and system capable of finding attributes that affect the achievement of a user's purpose, minimizing the amount of computation, thereby reducing the burden on the system and increasing accuracy.

본 발명은 여러 시각화 방법을 이용하여, 사용자로 하여금 꼭 글을 정독하지 않고도 시각화 자료만으로 기업 구인공고의 특성을 파악할 수 있는 방법 및 시스템을 제공하고자 한다.The present invention seeks to provide a method and system that enables a user to understand the characteristics of a corporate job posting using only visualization data without necessarily reading the text, by utilizing various visualization methods.

상기한 문제를 해결하기 위한 본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 단계; 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 단계; 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 단계; 및 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 단계;를 포함할 수 있다. According to an embodiment of the present invention for solving the above-described problem, a job search matching method may include a step of receiving input of at least one desired corporate condition and user specification from a user; a step of learning and calculating a correlation between companies satisfying the corporate condition and corporate preference according to at least one of the user specifications; a step of determining a ranking of companies satisfying the corporate condition according to the correlation; and a step of presenting a list of companies to the user according to the corporate ranking according to the correlation.

본 발명의 일 실시 예에 따르면, 상기 상관 관계 연산 단계는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭하는 단계; 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습하는 단계; 및 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산하는 단계;를 포함할 수 있다.According to one embodiment of the present invention, the correlation calculation step may include: a step of matching at least one user specification and a corporate preference; a step of learning by defining a correlation between a change in the user specification and a change in the corporate preference as a probability; and a step of calculating a correlation between the corporate condition and the corporate preference according to the at least one user specification according to the probability.

본 발명의 일 실시 예에 따르면, 상기 상관 관계 연산 단계는 상기 상관 관계가 하기 수학식 1에 의해 연산될 수 있다.According to one embodiment of the present invention, in the correlation calculation step, the correlation can be calculated by the following mathematical expression 1.

[수학식 1][Mathematical formula 1]

(여기서, C(S, P)는 상관 관계, E(S)는 분석관련성 값, S는 사용자 스펙, P는 기업 선호도이다.)(Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the corporate preference.)

본 발명의 일 실시 예에 따르면, 상기 사용자 스펙은 학교, 전공, 성별, 나이, 지역, MBTI 중 적어도 하나 이상을 포함할 수 있다.According to one embodiment of the present invention, the user specifications may include at least one of school, major, gender, age, region, and MBTI.

본 발명의 일 실시 예에 따르면, 상기 기업들의 순위를 결정 단계는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정하고, 상기 사용자 제시 단계는, 상기 순위에 따른 결과를 제공할 수 있다.According to one embodiment of the present invention, the step of determining the ranking of the companies determines a ranking including a company preference according to a combination of a plurality of the user specifications, and the user presentation step can provide a result according to the ranking.

상기한 문제를 해결하기 위한 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템은 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 데이터 가공부; 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 학습부; 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 순위결정부; 및 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 시각화부;를 포함할 수 있다. According to an embodiment of the present invention for solving the above-described problem, a job search matching system may include a data processing unit for receiving at least one desired corporate condition and user specification from a user; a learning unit for learning and calculating a correlation between companies satisfying the corporate condition and corporate preference according to at least one of the user specifications; a ranking unit for determining a ranking of companies satisfying the corporate condition according to the correlation; and a visualization unit for presenting a list of companies to the user according to the corporate ranking according to the correlation.

본 발명의 일 실시 예에 따르면, 상기 학습부는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭하고, 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습하고, 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산할 수 있다. According to one embodiment of the present invention, the learning unit can match at least one user specification with a corporate preference, learn by defining a correlation between a change in the user specification and a change in the corporate preference as a probability, and calculate a correlation between the corporate condition and the corporate preference according to at least one user specification according to the probability.

본 발명의 일 실시 예에 따르면, 상기 학습부는 상기 상관 관계가 하기 수학식 1에 의해 연산할 수 있다.According to one embodiment of the present invention, the learning unit can calculate the correlation by the following mathematical expression 1.

[수학식 1][Mathematical formula 1]

본 발명의 일 실시 예에 따르면, 학습부는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정하고, 상기 시각화부는, 상기 순위에 따른 결과를 제공할 수 있다.According to one embodiment of the present invention, the learning unit determines a ranking including corporate preferences according to a combination of a plurality of the user specifications, and the visualization unit can provide a result according to the ranking.

본 발명에 따르면, 사용자의 성향에 맞는 기업을 순위별로 리스팅하여 추천할 수 있다.According to the present invention, companies that match the user's preferences can be listed and recommended in order of rank.

또한, 사용자의 목적 달성에 영향을 미치는 속성을 찾아내 연산량을 최소화하여 시스템의 부담을 줄이면서 예측의 정확도를 높일 수 있다.In addition, it is possible to find attributes that affect the user's goal achievement, minimize the amount of computation, and increase the accuracy of prediction while reducing the burden on the system.

또한, 여러 시각화 방법을 이용하여, 사용자로 하여금 꼭 글을 정독하지 않고도 시각화 자료만으로 기업 구인공고의 특성을 파악할 수 있다.In addition, by using various visualization methods, users can understand the characteristics of corporate job postings through visualization data alone without having to read the text.

한편, 본 발명의 효과는 이상에서 언급한 효과들로 제한되지 않으며, 이하에서 설명할 내용으로부터 통상의 기술자에게 자명한 범위 내에서 다양한 효과들이 포함될 수 있다.Meanwhile, the effects of the present invention are not limited to the effects mentioned above, and various effects may be included within a range obvious to those skilled in the art from the contents described below.

도 1은 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템의 블록도이다.
도 2는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지의 분석 중 화면이다.
도 3은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이다.
도 4는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 시각화 결과 화면이다.
도 5는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지에서 원하는 정보가 없을 때 결과 화면이다.
도 6은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 결과 매칭되는 기업이 없을 때 결과 화면이다.
도 7은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이다.
도 8은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법의 흐름도이다.Figure 1 is a block diagram of a job search matching system according to one embodiment of the present invention.
Figure 2 is a screen during analysis of a matching page that applies a job search matching method according to one embodiment of the present invention.
Figure 3 is a matching result screen that applies a job search matching method according to one embodiment of the present invention.
Figure 4 is a visualization result screen that applies a job search matching method according to an embodiment of the present invention.
FIG. 5 is a result screen when there is no desired information on a matching page that applies a job search matching method according to an embodiment of the present invention.
Figure 6 is a result screen when there is no matching company as a result of applying a job search matching method according to one embodiment of the present invention.
Figure 7 is a matching result screen that applies a job search matching method according to one embodiment of the present invention.
Figure 8 is a flow chart of a job search matching method according to one embodiment of the present invention.

이하, 첨부된 도면들을 참조하여 본 발명에 따른 '구인구직 매칭 방법 및 시스템'을 상세하게 설명한다. 설명하는 실시 예들은 본 발명의 기술사상을 당업자가 용이하게 이해할 수 있도록 제공되는 것으로 이에 의해 본 발명이 한정되지 않는다. 또한, 첨부된 도면에 표현된 사항들은 본 발명의 실시 예들을 쉽게 설명하기 위해 도식화된 도면으로 실제로 구현되는 형태와 상이할 수 있다.Hereinafter, the 'job search matching method and system' according to the present invention will be described in detail with reference to the attached drawings. The described embodiments are provided so that those skilled in the art can easily understand the technical idea of the present invention, but the present invention is not limited thereby. In addition, the matters expressed in the attached drawings are schematic drawings for easily explaining the embodiments of the present invention, and may differ from the form actually implemented.

한편, 이하에서 표현되는 각 구성부는 본 발명을 구현하기 위한 예일 뿐이다. 따라서, 본 발명의 다른 구현에서는 본 발명의 사상 및 범위를 벗어나지 않는 범위에서 다른 구성부가 사용될 수 있다.Meanwhile, each component expressed below is only an example for implementing the present invention. Therefore, in other implementations of the present invention, other components may be used without departing from the spirit and scope of the present invention.

또한, 각 구성부는 순전히 하드웨어 또는 소프트웨어의 구성만으로 구현될 수도 있지만, 동일 기능을 수행하는 다양한 하드웨어 및 소프트웨어 구성들의 조합으로 구현될 수도 있다. 또한, 하나의 하드웨어 또는 소프트웨어에 의해 둘 이상의 구성부들이 함께 구현될 수도 있다.In addition, each component may be implemented purely by hardware or software configurations, but may also be implemented by a combination of various hardware and software configurations that perform the same function. In addition, two or more components may be implemented together by a single hardware or software.

또한, 어떤 구성요소들을 '포함'한다는 표현은, '개방형'의 표현으로서 해당구성요소들이 존재하는 것을 단순히 지칭할 뿐이며, 추가적인 구성요소들을 배제하는 것으로 이해되어서는 안된다.Additionally, the expression 'including' certain components is an 'open' expression, simply indicating the presence of those components, and should not be understood as excluding additional components.

도 1은 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템의 블록도이다.Figure 1 is a block diagram of a job search matching system according to one embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템(1)은 데이터 가공부(100), 학습부(200) 및 시각화부(300)를 포함할 수 있다. Referring to FIG. 1, a job search matching system (1) according to one embodiment of the present invention may include a data processing unit (100), a learning unit (200), and a visualization unit (300).

상기 데이터 가공부(100)는 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트 등을 통해서 기업의 구인 구직 정보를 수집할 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트에서 기업의 구인 구직 정보를 크롤링해 수집할 수 있다. 상기 데이터 가공부(100)는 상기 사용자가 원하는 기업 조건을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 상기 사용자의 스펙을 적어도 하나 이상 입력 받을 수 있다.The data processing unit (100) above can receive at least one desired corporate condition and user specification from the user. The data processing unit (100) can collect corporate job search information through job search and recruitment sites, etc. The data processing unit (100) can crawl and collect corporate job search and recruitment information from job search and recruitment sites. The data processing unit (100) can receive at least one desired corporate condition from the user. The data processing unit (100) can receive at least one specification from the user.

상기 데이터 가공부(100)는 구인 구직 정보에 관한 정형화된 데이터를 마련하기 위해, 여러 채용사이트들 중 정형화된 채용공고의 데이터를 가지고 있는 구인 구직 사이트(예를 들어, 잡플래닛) 플랫폼을 사용해 크롤링할 수 있다. 상기 데이터 가공부(100)는 잡플래닛이 다른 구인구직 플랫폼보다 제공하는 정보가 상대적으로 정형화 돼 있지만, 정보가 부족한 채용공고에 대해, 예외처리를 해주어 상황에 맞는 크롤링을 진행할 수 있다. 상기 데이터 가공부(100)는 브라우저 자동화 도구인 Selenium을 통해 데이터 크롤링을 진행할 수 있고, 대량의 데이터를 수집하기 위해 모든 직무별 채용공고를 2~3일에 걸쳐 수집할 수 있다.The above data processing unit (100) can crawl using a platform of a job search site (e.g., Jobplanet) that has data of standardized job postings among various job search sites in order to prepare standardized data on job search and recruitment information. The data processing unit (100) can perform crawling appropriate to the situation by making exceptions for job postings that have insufficient information, although the information provided by Jobplanet is relatively standardized compared to other job search and recruitment platforms. The data processing unit (100) can perform data crawling using Selenium, a browser automation tool, and can collect all job postings for each job over a period of 2 to 3 days in order to collect a large amount of data.

상기 데이터 가공부(100)는 형식은 통일되었지만, 채용공고마다 다른 말투와 필요하지 않은 정보들을 교정(불용어 처리)할 수 있고, 유의미한 데이터를 뽑고 학습의 정확도를 높일 수 있도록 데이터 토큰화 단계를 가질 수 있다. 본 발명의 일 실시 예에 따르면, 상기 데이터 가공부(100)는 프로젝트에 맞게 토큰화 방식을 찾기 위해, 대중적인 한글 형태소 분석기인 konlpy(Mecab,Kkma, Okt 등..)와 최신 기술인 Kiwi토크나이저를 직접 비교 분석한 뒤, 더 다양한 품사 태깅 방식을 이용하는 Kiwi토크나이저를 채택할 수 있다. 결론적으로 Kiwi+mecab토크나이저를 혼합해 유기적인 토큰화 방법을 사용했는데, kiwi토크나이저에서 인지할 수 없는 한글과 영어 조합에 대해서는 Mecab토크나이저를 사용하였다. 상기 데이터 가공부(100)는 토큰화를 진행한 데이터를 복리후생과 자격요건 각각의 데이터로 저장할 수 있다. 추후 상기 학습부(200)에서 사용자가 비슷한 직무에 대해 두 가지 기준 중 하나를 선택해 유사한 기업을 추천 받을 수 있도록 학습 모델을 설정할 수 있다.The above data processing unit (100) may have a unified format, but may correct different speech styles and unnecessary information (stop word processing) for each job posting, and may have a data tokenization step to extract meaningful data and increase the accuracy of learning. According to one embodiment of the present invention, the data processing unit (100) may directly compare and analyze konlpy (Mecab, Kkma, Okt, etc.), a popular Korean morphological analyzer, and Kiwi tokenizer, a state-of-the-art technology, to find a tokenization method suitable for the project, and may then adopt Kiwi tokenizer, which utilizes more diverse part-of-speech tagging methods. In conclusion, an organic tokenization method was used by mixing Kiwi and mecab tokenizers, and Mecab tokenizer was used for Korean and English combinations that the kiwi tokenizer cannot recognize. The data processing unit (100) may store tokenized data as data for each of welfare and qualification requirements. In the future, the learning model can be set up in the learning unit (200) so that the user can select one of two criteria for similar jobs and receive recommendations for similar companies.

상기 데이터 가공부(100)는 기업의 채용 공고에서 기업이름과 직무를 tag로, 채용정보를 말뭉치로써 문서라고 간주한다면, 기업이름과 직무의 제목을 가지고 있는 문서들의 유사도 학습을 위해 문서의 벡터화를 진행할 수 있다. 이를 위해, 상기 데이터 가공부(100)는 기존의 워드 임베딩 기술 Word2Vec에서 확장된 문서 임베딩 기술인 Doc2Vec을 사용할 수 있다. Doc2Vec을 데이터에 적용시키기 위해, 자연어를 벡터로 변환하는데 필요한 대부분의 편의기능을 제공하는 파이썬 라이브러리인 gensim을 사용할 수 있다.The above data processing unit (100) can perform vectorization of documents for similarity learning of documents having company names and job titles, if the company name and job title are considered as tags in the company's job posting and the job information is considered as a corpus. To this end, the data processing unit (100) can use Doc2Vec, which is a document embedding technology extended from the existing word embedding technology Word2Vec. In order to apply Doc2Vec to data, gensim, a Python library that provides most of the convenience functions required to convert natural language into vectors, can be used.

상기 데이터 가공부(100)는 데이터 전처리의 정확도를 정교화하며 Doc2Vec을 통해 학습한 모델(각 기업 채용 공고마다 유사도를 나타내는 벡터모델)을 사용자입장에서 편리하게 사용해 볼 수 있도록, 시스템을 Flask 프레임워크로 개발할 수 있다. 사용자는 상기 시각화부(300)를 통해 구현된 결과로 잘 정리된 채용 공고와 시각적인 자료, 데이터의 신뢰도를 확인할 수 있는 채용 공고 유사도를 웹을 통해 확인할 수 있다.The above data processing unit (100) refines the accuracy of data preprocessing and can develop a system with the Flask framework so that users can conveniently use the model (vector model representing the similarity for each company's job posting) learned through Doc2Vec. Users can check well-organized job postings, visual data, and job posting similarity that can confirm the reliability of data through the web as a result implemented through the above visualization unit (300).

본 발명의 일 실시 에에 따르면, 상기 데이터 가공부(100)는 잡플래닛에 등록되어 있는 모든 구인공고들에 대해 크롤링 실시하고, 잡플래닛 사이트의 JavaScript등 동적으로 구현된 부분을 크롤링하기 위해 Selenium의 웹 드라이버를 이용할 수 있다. 잡플래닛이 다른 구인구직 플랫폼보다 상대적으로 정형화된 정보를 제공하지만, 기업마다 제공하는 정보 종류의 개수가 다르기 때문에, 예외 처리를 이용하여 데이터를 수집할 수 있다.According to one embodiment of the present invention, the data processing unit (100) crawls all job postings registered in Jobplanet and may use Selenium's web driver to crawl dynamically implemented parts such as JavaScript of the Jobplanet site. Although Jobplanet provides relatively standardized information compared to other job search platforms, the number of types of information provided by each company is different, so data can be collected using exception handling.

상기 데이터 가공부(100)는 예외상황 발생시, 해당 페이지를 다시 크롤링할 수 있다. 데이터 크롤링 후, 기업이름, 구인공고제목, 마감일, 채용직무, 경력, 고용형태, 급여, 스킬, 기업소개, 주요업무, 자격요건, 우대사항, 채용절차, 복리후생 별로 데이터를 나누어 csv파일로 저장할 수 있다.The above data processing unit (100) can re-crawl the relevant page when an exceptional situation occurs. After crawling the data, the data can be divided into company name, job posting title, closing date, job position, career, employment type, salary, skills, company introduction, main duties, qualifications, preferential treatment, hiring process, and welfare benefits and saved as a CSV file.

상기 데이터 가공부(100)는 저장한 csv파일에 대해 일괄적으로 전처리 작업(불용어 처리 + 성격이 비슷한 데이터는 하나의 column으로 합침)을 수행할 수 있다. 예를 들어, 필요스킬, 자격요건, 우대사항은 성격이 같기 때문에 '자격요건' column에 몰아넣을 수 있다. 예를 들어, 구인공고제목, 직무, 주요업무는 성격이 같기 때문에 'task' column에 몰아넣을 수 있다.The above data processing unit (100) can perform batch preprocessing (stop word processing + merge data with similar characteristics into one column) on the saved CSV file. For example, required skills, qualifications, and preferred items can be grouped into the 'qualifications' column because they have similar characteristics. For example, job posting title, job title, and main tasks can be grouped into the 'task' column because they have similar characteristics.

상기 데이터 가공부(100)는 똑같은 구인공고라도, 마감일이 다른 경우 다른 구인공고로 인식하여 저장되기 때문에, pandas 모듈의 drop_duplicates를 이용하여 '기업명+구인공고제목'이 같은 경우, 중복 데이터를 제외할 수 있다.Since the above data processing unit (100) recognizes and stores the same job postings with different deadlines as different job postings, it is possible to exclude duplicate data when the 'company name + job posting title' is the same by using the drop_duplicates of the pandas module.

상기 데이터 가공부(100)는 불용어 처리시, 사용자에게 구인공고의 raw data 또한 제공할 것이기 때문에, 원본 상태의 구인공고 글을 따로 column을 만들어 보관할 수 있다.Since the above data processing unit (100) will also provide the user with raw data of the job posting when processing stop words, the job posting in its original state can be stored in a separate column.

상기 데이터 가공부(100)는 csv 저장시, 구인공고에 "\r" 문자가 포함된 경우, 자동으로 행이 넘어가는 상황이 발생하여 \r를 \n으로 replace 처리할 수 있다.The above data processing unit (100) can automatically replace \r with \n when a job posting contains the character "\r" when saving as a CSV, thereby causing a situation where the line is automatically turned over.

상기 데이터 가공부(100)는 여러 토크나이저를 사용한 토큰화 기법들을 비교한 후, 가장 토큰화가 잘되는(문장에서 중요한 단어를 잘 추출하는) 토크나이저를 사용할 수 있다.The above data processing unit (100) can compare tokenization techniques using multiple tokenizers and then use the tokenizer that tokenizes best (extracts important words from sentences well).

Kiwi, Mecab, Kkma, Okt등을 비교한 결과, kiwi와 Mecab토크나이저가 가장 품사 태그를 다양하게 나누고, 학습에 적합하다.As a result of comparing Kiwi, Mecab, Kkma, and Okt, the Kiwi and Mecab tokenizers divide part-of-speech tags most diversely and are suitable for learning.

상기 데이터 가공부(100)는 불용어 처리된 데이터에서, kiwi토크나이저를 이용하여 '일반명사, 고유명사, 영어, 어근'으로 토큰화할 수 있다.The above data processing unit (100) can tokenize stopword-processed data into ‘common nouns, proper nouns, English words, and roots’ using the kiwi tokenizer.

kiwi 토크나이저는 뜻을 알 수 없는 한글과 영어의 조합을 만나면 exception 발생하게 된다. 예를 들어, 리깅TA(주로 게임 개발사의 구인공고에 자주 등장)하는데 exception이 발생할 수 있다.Kiwi tokenizer will raise an exception when it encounters a combination of Korean and English characters that it does not understand. For example, an exception can be raised when rigging TA (often shown in job postings for game developers).

상기 데이터 가공부(100)는 kiwi에서 exception을 발생시키는 데이터에 대해서는 Mecab 토크나이저를 사용하여 토큰화할 수 있다The above data processing unit (100) can tokenize data that generates an exception in kiwi using the Mecab tokenizer.

상기 데이터 가공부(100)는 개발한 토크나이저를 이용하여 csv 파일을 두 개만들 수 있다. 예를 들어, 복리후생과 task column을 토큰화 한 csv 파일, 자격요건과 task column을 토큰화 한 csv 파일을 생성할 수 있다.The above data processing unit (100) can create two CSV files using the developed tokenizer. For example, a CSV file that tokenizes the welfare and task columns and a CSV file that tokenizes the qualifications and task columns can be created.

상기 데이터 가공부(100)는 각 csv 파일을 만들 때에 task column을 추가적으로 토큰화 하는 이유는, 학습모델을 2개 만들 것이기 때문이다. 예를 들어, 복리후생/자격요건을 학습한 모델 + task를 학습한 모델을 만들 수 있다.The reason why the above data processing unit (100) additionally tokenizes the task column when creating each CSV file is because two learning models will be created. For example, a model that has learned welfare/qualification requirements + a model that has learned tasks can be created.

상기 데이터 가공부(100)가 학습 모델을 2개 사용하는 이유는 사용자가 입력한 기업과 '복리후생/자격요건'중 하나의 기준이 비슷한 기업들을 추천할 것인데, '복리후생/자격요건'은 비슷한데 정작 채용직무가 다르다면, 해당 추천은 사용자에게 의미가 없기 때문이다.The reason why the above data processing unit (100) uses two learning models is that it will recommend companies that have similar criteria among the 'welfare/qualification requirements' and the company entered by the user. However, if the 'welfare/qualification requirements' are similar but the actual job positions are different, the recommendation is meaningless to the user.

사용자는 자신의 전문 분야에 대한 구직을 시도하고, 일반적으로 어떤 전문 분야에 대한 직무의 종류는 제한되어 있다.Users try to find jobs in their field of expertise, and there are usually limited types of jobs available in any given field.

상기 데이터 가공부(100)는 토큰화 된 데이터를 Doc2Vec모델을 이용해 학습을 수행하는데, 이 때 Doc2Vec모델의 Hyperparameter를 조정해 가며, 가장 우수한 정확도를 가지는 값을 채택할 수 있다.The above data processing unit (100) performs learning on tokenized data using a Doc2Vec model, and at this time, the hyperparameters of the Doc2Vec model can be adjusted to adopt a value with the best accuracy.

이때 기계 학습의 방법으로 Doc2Vec을 사용한 이유는, TensorFlow를 이용한 문서 임베딩 방법과 비교해 보았을 때 속도가 더 빠르고, 여러 hyperparameter를 조정하며 학습 가능하다는 점에서 Doc2Vec이 학습에 더 유리할 것으로 판단했기 때문이다.The reason Doc2Vec was used as a machine learning method at this time is because, compared to the document embedding method using TensorFlow, it was judged that Doc2Vec would be more advantageous for learning in that it is faster and can be learned by adjusting multiple hyperparameters.

상기 데이터 가공부(100)는 학습한 두개의 모델 중 task를 학습한 모델에서, 사용자가 입력한 기업과 70% 이상 유사한 기업들의 index를 우선적으로 추출할 수 있다.The above data processing unit (100) can preferentially extract the index of companies that are 70% or more similar to the company input by the user from the model that learned the task among the two learned models.

상기 데이터 가공부(100)는 직무의 유사도 하한선을 70%로 정한 이유는, 직무 모델(task column을 학습한 모델)을 학습시에 사용한 데이터의 양이 많았기 때문에(task column에 '구인공고제목, 직무, 주요업무' 데이터를 몰아 넣음), 70% 유사도를 갖더라도 실제 데이터를 비교한 경우 유사했기 때문이다.The reason why the above data processing unit (100) set the lower limit of job similarity to 70% is because the amount of data used when learning the job model (the model that learned the task column) was large (data on 'job posting title, job, and main tasks' were pushed into the task column), and even if the similarity was 70%, it was similar when the actual data was compared.

상기 데이터 가공부(100)는 직무모델에서 추출된 기업들에 대해 '복리후생/자격요건'모델로 유사한 기업들을 추출하여 유사도가 높은순으로 출력한다. 예를 들어, 최종적으로 직무가 70%이상 유사하면서 '복리후생/자격요건'또한 유사한 기업들이 추출될 수 있다.The above data processing unit (100) extracts similar companies from the companies extracted from the job model using the 'welfare/qualification requirements' model and outputs them in order of high similarity. For example, companies whose jobs are 70% or more similar and whose 'welfare/qualification requirements' are also similar can be extracted.

상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들을 선별해 리스팅할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들의 리스트에서 상기 사용자의 스펙에 따른 기업 선호도의 상관 관계에 따라 상기 사용자가 선호할 만한 기업들을 순위에 따라 나열할 수 있다. The learning unit (200) above can learn and calculate the correlation between companies satisfying the above business conditions and the corporate preference according to at least one of the above user specifications. The learning unit (200) can select and list companies satisfying the above business conditions. The learning unit (200) can list companies that the user is likely to prefer in order of the correlation between the corporate preference according to the user specifications from the list of companies satisfying the above business conditions.

상기 학습부(200)는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭할 수 있다. 상기 학습부(200)는 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습할 수 있다. 상기 학습부(200)는 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산할 수 있다.The learning unit (200) above can match at least one user specification and a corporate preference. The learning unit (200) can learn by defining the correlation between the change in the user specification and the change in the corporate preference as a probability. The learning unit (200) can calculate the correlation between the corporate condition and the corporate preference according to at least one user specification according to the probability.

상기 학습부(200)는 상기 기상 상태 속성의 변동과 상기 대상의 상태의 변동의 상관 관계를 확률로 정의하여 학습하여 상관 관계를 측정할 수 있다.The above learning unit (200) can learn the correlation between the change in the weather condition attribute and the change in the condition of the target by defining it as a probability, and measure the correlation.

상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 영향력을 연산할 수 있다.The above learning unit (200) can calculate the influence of corporate preference according to at least one of the above user specifications and the companies satisfying the above corporate conditions.

상기 영향력은 상기 기업 선호도에 상기 사용자 스펙의 변화에 따른 영향이 얼마나 있는지에 따라 달라지는 값으로 다음의 수학식 1에 의해 연산될 수 있다.The above influence is a value that varies depending on how much the change in the user specifications affects the above corporate preference, and can be calculated by the following mathematical expression 1.

[수학식 1][Mathematical formula 1]

여기서 p(x)는 선호도 x의 변화가 발생할 확률이다. 발생할 확률이 1에 가까울수록 영향력은 0에 가까워져 적은 케이스의 학습이 필요하므로 연산량이 적어지고, 발생할 확률이 0에 가까울수록 영향력은 무한히 커져 많은 케이스의 학습이 필요하므로 연산량이 많아 질 수 있다.Here, p(x) is the probability that a change in preference x will occur. The closer the probability of occurrence is to 1, the closer the influence is to 0, so learning from fewer cases is required, which reduces the amount of computation. On the other hand, the closer the probability of occurrence is to 0, the greater the influence is, so learning from many cases is required, which can increase the amount of computation.

상기 학습부(200)에서 연산부하는 상기 영향력의 기대값(평균)을 나타내는 것으로써, 연산부하가 크다는 것은 평균 영향력이 크다는 것이며 불확실성이 크면 클수록 분류하기는 어려워지기 때문에 연산부하가 가장 작은 것을 상위 의사 결정 노드에 위치시켜 연산량을 감소시킬 수 있다. 상기 연산부하는 하기 수학식 2를 연산해 구할 수 있다.In the above learning unit (200), the computational load represents the expected value (average) of the influence. A larger computational load means a larger average influence. As the uncertainty increases, classification becomes more difficult, so the computational amount can be reduced by placing the smallest computational load at the upper decision-making node. The computational load can be obtained by calculating the following mathematical expression 2.

[수학식 2][Mathematical formula 2]

여기서 S는 이미 발생한 모든 사건의 모음을 의미하며 c는 사건의 개수를 의미할 수 있다.Here, S represents the collection of all events that have already occurred, and c can represent the number of events.

상기 학습부(200)에서 상기 상관 관계는 하기 수학식 3을 연산해 구할 수 있다.In the above learning unit (200), the correlation can be obtained by calculating the following mathematical expression 3.

[수학식 3][Mathematical Formula 3]

여기서, C(S, P)는 상관 관계, E(S)는 분석관련성 값, S는 사용자 스펙, P는 기업 선호도이다. 여기서 P는 사용자 스펙 중 특정 스펙을 의미하며 어떤 스펙을 가지고 분류했을 때 가장 연산부하가 작은지(즉, 정보획득량이 큰 것)를 판단할 수 있다.Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the corporate preference. Here, P refers to a specific specification among the user specifications, and it can be used to determine which specification has the smallest computational load (i.e., the largest amount of information obtained) when classified.

상기 학습부(200)는 상기 상관 관계에 따른 상기 사용자 스펙의 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 순서대로 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 상기 사용자 스펙을 추천 우선순위에 둘 수 있다.The learning unit (200) can determine the ranking of the user specifications according to the correlation. The learning unit (200) can determine the ranking in order of decreasing correlation. The learning unit (200) can place the user specifications with low correlation in the recommendation priority.

상기 사용자 스펙은 학교, 전공, 성별, 나이, 지역, MBTI 중 적어도 하나 이상을 포함할 수 있다.The above user specifications may include at least one of school, major, gender, age, region, and MBTI.

상기 학습부(200)는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정할 수 있다.The above learning unit (200) can determine a ranking including corporate preferences based on a combination of multiple user specifications.

상기 시각화부(300)는 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시할 수 있다.The above visualization unit (300) can present a list of companies to the user according to the company ranking according to the correlation.

상기 시각화부(300)는 복수의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. 상기 시각화부(300)는 상기 사용자 스펙들의 복수의 조합에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. The above visualization unit (300) can provide a result according to the ranking of the correlation of corporate preferences according to a plurality of the above user specifications. The above visualization unit (300) can provide a result according to the ranking of the correlation of corporate preferences according to a plurality of combinations of the above user specifications.

상기 시각화부(300)는 사용자가 유사한 기업을 추천 받고 추천 받은 기업 공고의 정보를 확인 할 때, 시각적으로 어떤 부분이 유사했는지 확인하기 쉽도록 데이터를 시각화하여 사용자에게 제공할 수 있다. 상기 시각화부(300)는 각각의 기업 공고 내용에 자주 등장하는 핵심 단어를 다양하게 시각화하는 Wordcloud 방식을 사용할 수 있다. 상기 시각화부(300)는 additional web page에 유사한 기업들을 2차원 좌표평면 상에 TSNE(고차원의 벡터 데이터의 차원을 축소 하는 기법)를 이용해 보여줌으로써, 사용자에게 직관적이고 흥미로운 방법으로 유사한 기업을 탐색해볼 수 있는 기능도 제공할 수 있다.The visualization unit (300) above can provide data visualized to the user so that the user can easily check which parts are visually similar when the user receives recommendations for similar companies and checks the information of the recommended company announcements. The visualization unit (300) can use a word cloud method that visualizes key words that frequently appear in the content of each company announcement in various ways. The visualization unit (300) can also provide the user with a function to search for similar companies in an intuitive and interesting way by showing similar companies on an additional web page on a two-dimensional coordinate plane using TSNE (a technique for reducing the dimension of high-dimensional vector data).

상기 시각화부(300)는 앞서 '데이터 임베딩' 단계까지 개발한 시스템을 Flask를 이용한 웹 서버에 탑재해 출력할 수 있다. Backend와 FrontEnd간의 데이터 전송은, back-end에서는 html파일을 클라이언트에게 보여주는 render_template 함수를 호출 시 parameter로 front-end로 데이터를 전달하는 방법으로 수행될 수 있다. Back-end에서 전달받은 데이터를, Front-end에서는 JavaScript와 jQuery를 이용하여 html상에 배치한 컴포넌트들에 나타나도록 할 수 있다. Front-end에서 웹 서버로의 데이터 전달은 form태그의 get method를 이용하거나, front-end에서 호출한 URL에 data를 포함하여 전달 후 back-end에서 처리하여 이용하는 방식으로 구현할 수 있다.The above visualization part (300) can output the system developed up to the 'data embedding' stage by loading it on a web server using Flask. Data transmission between the Backend and the Frontend can be performed by transmitting data to the Frontend as a parameter when calling the render_template function, which shows an html file to the client, in the Backend. The data received from the Backend can be displayed in components placed on the html using JavaScript and jQuery in the Frontend. Data transmission from the Frontend to the web server can be implemented by using the get method of the form tag, or by including data in the URL called from the Frontend and transmitting it, and then processing and using it in the Backend.

웹의 UI는 bootstrap의 다양한 요소들(ex. progressbar, radio button, table, card)을 이용하여 직관적이고 간결하게 구성할 수 있다.The UI of the web can be composed intuitively and concisely using various Bootstrap elements (ex. progressbar, radio button, table, card).

시스템이 새로운 학습을 진행중인 경우나 시각화 자료를 생성하는 등, 사용자가 시스템의 연산을 기다려야 하는 경우에는, progressbar를 이용하여 사용자로 하여금 어떤 연산이 진행중인지 알리도록 할 수 있다.If the user needs to wait for the system to perform some operation, such as when the system is learning something new or generating a visualization, you can use a progressbar to let the user know what operation is in progress.

html 요소들의 구성은, <head>의 style 태그에서 css 문법을 이용하여 배치와 크기를 디자인할 수 있다.The composition of HTML elements can be designed in terms of layout and size using CSS grammar in the style tag of <head>.

시각화 정보를 보여주는 html 파일의 경우, 웹 브라우저에 캐싱된 데이터가 아닌 새로 바뀐 정보를 제공해야 하기 때문에, Cache-Control를 이용하여 캐싱을 하지 않도록 막고, 시각화 이미지를 부를시에 URL에 랜덤한 숫자를 추가하여 서버에 전송 함으로써, 같은 이름의 이미지 파일에 다른 시각화 정보를 덮어씌워 사용하더라도 바뀐 이미지가 사용자에게 제공하도록 구현할 수 있다.In the case of html files that display visualization information, since newly changed information must be provided, not data cached in the web browser, caching can be prevented using Cache-Control, and when calling a visualization image, a random number is added to the URL and transmitted to the server, so that even if an image file with the same name is overwritten with other visualization information, the changed image can be provided to the user.

예시적인 웹 페이지의 구성은 다음과 같다.An example web page might look like this:

시작페이지: hello.html,Start page: hello.html,

구인공고링크 입력 페이지: apply.html,Job posting link input page: apply.html,

로딩페이지: index.html,Loading page: index.html,

분석결과페이지: result.html,Analysis results page: result.html,

기업 구인공고 세부정보 페이지: linktoimg.html,Company job posting details page: linktoimg.html,

'복리후생'정보가 없음을 알리는 페이지: nobokri.html,Page that reports that there is no 'welfare' information: nobokri.html,

유사한 기업이 없음을 알리는 페이지: nolist.html,A page that reports that there are no similar businesses: nolist.html,

시각화 정보를 보여주는 페이지: display_plot.htmlPage showing visualization information: display_plot.html

사용자는 apply.html에 잡플래닛 구인공고 링크를 입력한 뒤, 어떤 기준(복리후생 or 자격요건)에 따라 유사한 기업을 추천받을 것인지 입력하게 된다.Users enter the Jobplanet job posting link in apply.html, and then enter the criteria (welfare or qualification requirements) by which they would like to receive recommendations for similar companies.

상기 시각화부(300)는 웹 서버에서는 사용자가 입력한 링크와 기준에 따라, 학습된 모델에서 유사한 기업을 추출하여 result.html에서 table을 통해 표 형식으로 사용자에게 제공한다.The above visualization section (300) extracts similar companies from the learned model based on the links and criteria entered by the user on the web server and provides the results to the user in a table format through result.html.

상기 시각화부(300)는 사용자는 result.html의 각 기업의 [자세히보기]를 클릭하여 해당 기업 구인공고의 키워드를 분석한 WordCloud 결과와, 전처리 되지 않은 raw 구인공고 내용을 제공받는다.The above visualization section (300) provides the user with a WordCloud result that analyzes keywords of the job posting of the company by clicking [View Details] of each company in result.html, and the raw job posting content that has not been preprocessed.

상기 시각화부(300)는 사용자가 입력한 링크에 '복리후생'에 대한 정보가 없는 상태에서 복리후생으로 추천을 명령한 경우, 시스템은 nobokri.html을 보여줌으로써 사용자에게 링크를 다시 입력할 것을 사용자에게 요구한다.In the case where the above visualization section (300) orders a recommendation for welfare benefits when there is no information about ‘welfare benefits’ in the link entered by the user, the system requests the user to re-enter the link by showing nobokri.html.

상기 시각화부(300)는 사용자가 입력한 링크의 기업과 유사한 기업이 없는 경우(직무와 기준이 모두 유사한 기업이 없는 경우), 시스템은 nolist.html을 보여줌으로써 유사 기업이 없음을 알리고, 사용자에게 '입력받은 기준'이 유사하지는 않지만, '직무'가 유사한 다른 기업들을 추천해준다.In the above visualization section (300), if there is no company similar to the company in the link entered by the user (if there is no company with similar job and criteria), the system displays nolist.html to inform the user that there is no similar company, and recommends other companies that are not similar in 'input criteria' but have similar 'jobs'.

상기 시각화부(300)는 result.html에서 사용자는 시각화 정보를 보여주는 display_plot.html로 넘어갈 수 있다.In the above visualization section (300), the user can move from result.html to display_plot.html, which shows visualization information.

상기 시각화부(300)는 display_plot에서는 서버로부터 받은 TSNE좌표 데이터로, Plotly.js를 이용하여 html상에 scatter plot을 그린다.The above visualization part (300) uses Plotly.js to draw a scatter plot on HTML using TSNE coordinate data received from the server in display_plot.

상기 시각화부(300)는 이 때 2개의 plot을 보여주게 되는데, 사용자 입력 기업과 '사용자가 선택한 기준'이 비슷한 기업들을 시각화 한 scatter plot과 '직무'가 비슷한 기업들을 시각화 한 scatter plot을 보여준다.The above visualization section (300) shows two plots at this time: a scatter plot visualizing companies with similar user input companies and ‘user-selected criteria’ and a scatter plot visualizing companies with similar ‘jobs’.

사용자는 각 plot의 marker를 클릭함으로써, 해당 marker에 해당하는 기업의 구인공고내용을linktoimg.html로 이동하여 제공받는다.By clicking on the marker of each plot, the user is directed to linktoimg.html and provided with the job posting information of the company corresponding to the marker.

상기 시각화부(300)는 linktoimg.html에서는 WordCloud 결과와 raw 구인공고 내용 뿐만 아니라, 사용자로 하여금 시스템이 추천한 데이터의 신뢰도를 직접 확인할 수 있도록, 각 기준에 따른 '유사도'또한 제공하게 된다.The above visualization section (300) provides not only the WordCloud results and raw job posting content in linktoimg.html, but also ‘similarity’ according to each criterion so that the user can directly check the reliability of the data recommended by the system.

또한 linktoimg.html에서는 사용자가 '시각화 결과' 페이지에서 누르는 marker에 따라, 빨간 marker를 누르는 경우에는 그것이 사용자가 입력한 구인공고임을 알리고, 파란 marker를 누른 경우에는 해당 구인 공고가 사용자 입력 구인공고와 '어떤 기준'에서 유사하다고 분석했는지 제공하도록 했다.In addition, linktoimg.html provides information on which marker the user clicks on the 'Visualization Results' page. If the user clicks on the red marker, it informs them that it is the job posting they entered, and if the user clicks on the blue marker, it provides information on 'what criteria' the job posting was analyzed to be similar to the user-entered job posting.

각 구인공고 글의 임베딩 벡터는 200차원의 고차원 데이터이다.The embedding vector of each job posting is a 200-dimensional high-dimensional data.

상기 시각화부(300)는 사용자에게 시각화된 정보를 제공하기 위해, TSNE 모듈을 사용하여 데이터를 2차원으로 축소 시킨 뒤, 2차원 평면에 시각화 하여 제공한다.The above visualization unit (300) reduces data to two dimensions using the TSNE module and then provides the data by visualizing it on a two-dimensional plane in order to provide visualized information to the user.

상기 시각화부(300)는 이때 모든 기업의 정보를 시각화 하는 것이 아닌, 사용자 입력 기업으로부터 일정 거리만큼 떨어진 기업들(가장 유사한 기업들)만 시각화 하여 제공하게 된다.The above visualization unit (300) does not visualize information on all companies at this time, but only visualizes and provides information on companies (the most similar companies) that are a certain distance away from the user input company.

상기 시각화부(300)는 시각화된 정보들 중, 임의의 두 marker 간의 위치 차이가 매우 극소하여 하나의 marker처럼 보이는 현상을 방지하기 위해, 임의의 두 marker사이의 거리가 극소할 경우, 둘 중 하나의 위치를 조금 변경하여 겹치지 않도록 시각화를 수행한다.The above visualization unit (300) performs visualization by slightly changing the position of one of the two markers so that they do not overlap, in order to prevent a phenomenon in which the positional difference between any two markers among the visualized information is so minimal that they appear as one marker.

상기 시각화부(300)는 토큰화된 복리후생/자격요건 데이터들을 키워드로 WordCloud 모듈을 만들어 제공한다.The above visualization section (300) provides a WordCloud module by creating tokenized welfare/qualification data using keywords.

이때, 사용자가 유사도 판단의 기준으로 '복리후생'을 선택한 경우 '복리후생'토큰화 데이터로 만든 WordCloud 결과를 보여주고, '자격요건'을 선택한 경우, '자격요건' 토큰화 데이터로 만든 WordCloud 결과를 보여준다.At this time, if the user selects ‘welfare benefits’ as the criterion for judging similarity, the WordCloud result created with ‘welfare benefits’ tokenized data is shown, and if the user selects ‘qualification requirements’, the WordCloud result created with ‘qualification requirements’ tokenized data is shown.

도 2는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지의 분석 중 화면이고, 도 3은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이고, 도 4는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 시각화 결과 화면이고, 도 5는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지에서 원하는 정보가 없을 때 결과 화면이고, 도 6은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 결과 매칭되는 기업이 없을 때 결과 화면이고, 도 7은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이다.FIG. 2 is a screen during analysis of a matching page to which a job search matching method according to an embodiment of the present invention is applied, FIG. 3 is a matching result screen to which a job search matching method according to an embodiment of the present invention is applied, FIG. 4 is a visualization result screen to which a job search matching method according to an embodiment of the present invention is applied, FIG. 5 is a result screen when there is no desired information on a matching page to which a job search matching method according to an embodiment of the present invention is applied, FIG. 6 is a result screen when there is no matching company as a result of which a job search matching method according to an embodiment of the present invention is applied, and FIG. 7 is a matching result screen to which a job search matching method according to an embodiment of the present invention is applied.

도 2 내지 도 7을 참조하면, 본 발명의 일 실시 예에 따르면, 상기 데이터 가공부(100)는 구인 구직 사이트(예를 들어, 잡플래닛)에서 원하는 기업의 채용공고 URL을 복사하여 시스템에 입력한 뒤, 해당 공고의 복리후생과 자격 요건 중 마음에 드는 항목을 선택하면 해당 구인 구직 사이트에서 사용자가 입력한 URL의 채용공고를 수집하여, 시각화부(300)가 해당 기업과 채용 직무를 분석하여 유사기업을 표 형식으로 웹 페이지에 출력할 수 있다. 모델을 통한 연산을 수행하는 동안, 사용자에게 로딩화면을 디스플레이하고, 연산이 끝난 후 분석 결과를 제공할 수 있다. Referring to FIGS. 2 to 7, according to one embodiment of the present invention, the data processing unit (100) copies the URL of a job posting of a desired company from a job search site (e.g., Jobplanet) and inputs it into the system, and then when the user selects an item he or she likes among the welfare and qualification requirements of the posting, the job posting of the URL input by the user is collected from the job search site, and the visualization unit (300) analyzes the company and the job posting and outputs similar companies in a table format on a web page. While performing calculations through the model, a loading screen can be displayed to the user, and the analysis results can be provided after the calculations are completed.

상기 시각화부(300)에서 표시한 표에 나타난 기업들의 'Details' column의 [자세히보기]를 클릭하면, 해당 기업의 실제 구인공고 내용과, 그 내용을 키워드로 분석한 WordCloud 분석결과를 사용자에게 제공할 수 있다.By clicking [View Details] in the 'Details' column of the companies shown in the table in the visualization section (300) above, the user can be provided with the actual job posting content of the company and the WordCloud analysis results that analyze the content by keyword.

상기 표 형식의 분석 결과 하단의 [시각화 자료 보러가기]를 클릭하면, 사용자가 입력한 기업과 [직무]가 유사한 기업들, [사용자가 입력한 기준]이 유사한 기업들을 각각 2차원 시각화 자료로 만들어 사용자에게 제공할 수 있다.If you click [Go to visualization data] at the bottom of the analysis results in the table format above, you can create a two-dimensional visualization of companies with similar [jobs] to the company entered by the user and companies with similar [criteria entered by the user] and provide the user with the visualization data.

상기 시각화부(300)의 시각화 자료에서 빨간 점은 사용자가 입력한 URL의 기업이고, 파란점은 빨간 점의 기업과 유사하다고 분석된 기업들을 의미할 수 있다.In the visualization data of the above visualization section (300), the red dots may represent the company whose URL the user entered, and the blue dots may represent companies analyzed as being similar to the company of the red dots.

상기 시각화부(300)는 위의 시각화 자료에서 점들을 클릭한 경우에도, 그 점에 해당하는 기업에 대한 [자세히보기]정보를 사용자에게 제공할 수 있다.The above visualization section (300) can provide the user with [View Details] information about the company corresponding to the point even when the points in the above visualization data are clicked.

상기 시각화부(300)는 기업 구인 공고의 자세한 정보를 제공하는 페이지에서는 실제 구인 공고의 내용과 키워드 WordCloud 분석 결과, 그리고 사용자가 입력한 기업과 얼마나 유사한지 %로 나타낼 수 있다.The above visualization section (300) can display the actual content of the job posting, the keyword WordCloud analysis results, and the percentage of similarity with the company entered by the user on the page providing detailed information on the company job posting.

상기 시각화부(300)는 사용자가 입력한 URL의 채용공고에 '복리후생' 정보가 없고, 사용자가 '복리후생'을 기준으로 분석한 결과를 요청한 경우, 링크에 복리후생 정보가 없음을 알리고, 사용자가 URL을 다시 입력할 것을 요청할 수 있다.If the job posting in the URL entered by the user does not contain ‘welfare’ information and the user requests the results analyzed based on ‘welfare’, the visualization unit (300) may notify that there is no welfare information in the link and request the user to re-enter the URL.

상기 시각화부(300)는 사용자가 입력한 URL이 모델이 학습하지 않은 새로운 데이터인 경우, 추가적인 학습을 하는 데에 시간이 좀 더 소요되기 때문에, 로딩시간을 조정하고 사용자에게 어떤 연산을 하는 중인지 나타낼 수 있다.The above visualization unit (300) can adjust the loading time and show the user what operation is being performed when the URL entered by the user is new data that the model has not learned, because it takes more time to perform additional learning.

상기 시각화부(300)는 분석결과 비슷한 기업이 없는 경우(직무와 사용자 입력 기준이 모두 유사한 기업이 없는 경우), 이를 사용자에게 알릴 수 있다. 상기 시각화부(300)는 이 후 오른쪽 하단에 직무만 비슷한 기업들을 확인할 수 있는 링크를 제공할 수 있다. 사용자가 해당 링크를 클릭한 경우, 표 형식의 유사 기업리스트를 출력하지만, [직무가 비슷한 기업] 이라고 명시하여 사용자가 어떤 기준으로 분석된 내용인지 인지할 수 있도록 한다.The visualization unit (300) above can inform the user if there is no similar company in the analysis result (if there is no company with similar job and user input criteria). The visualization unit (300) can then provide a link at the bottom right to check companies with similar jobs. If the user clicks on the link, a list of similar companies in table format is output, but [companies with similar jobs] is specified so that the user can recognize the criteria by which the analysis was conducted.

도 8은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법의 흐름도이다.Figure 8 is a flow chart of a job search matching method according to one embodiment of the present invention.

도 8을 참조하면, 본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 단계(S110)를 포함할 수 있다.Referring to FIG. 8, a job search matching method according to an embodiment of the present invention may include a step (S110) of receiving input of at least one desired company condition and user specification from a user.

S110 단계에서, 상기 데이터 가공부(100)는 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트 등을 통해서 기업의 구인 구직 정보를 수집할 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트에서 기업의 구인 구직 정보를 크롤링해 수집할 수 있다. 상기 데이터 가공부(100)는 상기 사용자가 원하는 기업 조건을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 상기 사용자의 스펙을 적어도 하나 이상 입력 받을 수 있다. In step S110, the data processing unit (100) can receive at least one desired corporate condition and user specification from the user. The data processing unit (100) can collect corporate job search information through job search and recruitment sites, etc. The data processing unit (100) can crawl and collect corporate job search and recruitment information from job search and recruitment sites. The data processing unit (100) can receive at least one desired corporate condition from the user. The data processing unit (100) can receive at least one specification from the user.

S110 단계에서, 상기 데이터 가공부(100)는 구인 구직 정보에 관한 정형화된 데이터를 마련하기 위해, 여러 채용사이트들 중 정형화된 채용공고의 데이터를 가지고 있는 구인 구직 사이트(예를 들어, 잡플래닛) 플랫폼을 사용해 크롤링할 수 있다. 상기 데이터 가공부(100)는 잡플래닛이 다른 구인구직 플랫폼보다 제공하는 정보가 상대적으로 정형화 돼 있지만, 정보가 부족한 채용공고에 대해, 예외처리를 해주어 상황에 맞는 크롤링을 진행할 수 있다. 상기 데이터 가공부(100)는 브라우저 자동화 도구인 Selenium을 통해 데이터 크롤링을 진행할 수 있고, 대량의 데이터를 수집하기 위해 모든 직무별 채용공고를 2~3일에 걸쳐 수집할 수 있다.In step S110, the data processing unit (100) can crawl using a platform of a job search site (e.g., Jobplanet) that has data of standardized job postings among various job search sites in order to prepare standardized data on job search information. The data processing unit (100) can perform crawling appropriate to the situation by making exceptions for job postings that have insufficient information, although the information provided by Jobplanet is relatively standardized compared to other job search platforms. The data processing unit (100) can perform data crawling using Selenium, a browser automation tool, and can collect all job postings for each job over a period of 2 to 3 days in order to collect a large amount of data.

S110 단계에서, 상기 데이터 가공부(100)는 형식은 통일되었지만, 채용공고마다 다른 말투와 필요하지 않은 정보들을 교정(불용어 처리)할 수 있고, 유의미한 데이터를 뽑고 학습의 정확도를 높일 수 있도록 데이터 토큰화 단계를 가질 수 있다. 본 발명의 일 실시 예에 따르면, 상기 데이터 가공부(100)는 프로젝트에 맞게 토큰화 방식을 찾기 위해, 대중적인 한글 형태소 분석기인 konlpy(Mecab,Kkma, Okt 등..)와 최신 기술인 Kiwi토크나이저를 직접 비교 분석한 뒤, 더 다양한 품사 태깅 방식을 이용하는 Kiwi토크나이저를 채택할 수 있다. 결론적으로 Kiwi+mecab토크나이저를 혼합해 유기적인 토큰화 방법을 사용했는데, kiwi토크나이저에서 인지할 수 없는 한글과 영어 조합에 대해서는 Mecab토크나이저를 사용하였다. 상기 데이터 가공부(100)는 토큰화를 진행한 데이터를 복리후생과 자격요건 각각의 데이터로 저장할 수 있다. 추후 상기 학습부(200)에서 사용자가 비슷한 직무에 대해 두 가지 기준 중 하나를 선택해 유사한 기업을 추천 받을 수 있도록 학습 모델을 설정할 수 있다.In step S110, the data processing unit (100) may have a data tokenization step so that although the format is unified, different speech styles and unnecessary information for each job posting can be corrected (stopword processing) and meaningful data can be extracted and the accuracy of learning can be increased. According to one embodiment of the present invention, the data processing unit (100) may directly compare and analyze konlpy (Mecab, Kkma, Okt, etc.), a popular Korean morphological analyzer, and Kiwi tokenizer, a state-of-the-art technology, to find a tokenization method suitable for the project, and then adopt Kiwi tokenizer, which uses more diverse part-of-speech tagging methods. In conclusion, an organic tokenization method was used by mixing Kiwi and mecab tokenizers, and Mecab tokenizer was used for Korean and English combinations that cannot be recognized by Kiwi tokenizer. The data processing unit (100) may store tokenized data as data for welfare benefits and qualification requirements, respectively. In the future, the learning model can be set up in the learning unit (200) so that the user can select one of two criteria for similar jobs and receive recommendations for similar companies.

S110 단계에서, 상기 데이터 가공부(100)는 기업의 채용 공고에서 기업이름과 직무를 tag로, 채용정보를 말뭉치로써 문서라고 간주한다면, 기업이름과 직무의 제목을 가지고 있는 문서들의 유사도 학습을 위해 문서의 벡터화를 진행할 수 있다. 이를 위해, 상기 데이터 가공부(100)는 기존의 워드 임베딩 기술 Word2Vec에서 확장된 문서 임베딩 기술인 Doc2Vec을 사용할 수 있다. Doc2Vec을 저희 데이터에 적용시키기 위해, 자연어를 벡터로 변환하는데 필요한 대부분의 편의기능을 제공하는 파이썬 라이브러리인 gensim을 사용할 수 있다.In step S110, the data processing unit (100) may perform vectorization of documents for learning the similarity of documents having the company name and job titles, if the company name and job title are considered as tags in the company's job posting and the job information is considered as a corpus. To this end, the data processing unit (100) may use Doc2Vec, which is a document embedding technology extended from the existing word embedding technology Word2Vec. In order to apply Doc2Vec to our data, gensim, a Python library that provides most of the convenience functions required to convert natural language into vectors, may be used.

S110 단계에서, 상기 데이터 가공부(100)는 데이터 전처리의 정확도를 정교화하며 Doc2Vec을 통해 학습한 모델(각 기업 채용 공고마다 유사도를 나타내는 벡터모델)을 사용자입장에서 편리하게 사용해 볼 수 있도록, 시스템을 Flask 프레임워크로 개발했다. 사용자는 상기 시각화부(300)를 통해 구현된 결과로 잘 정리된 채용 공고와 시각적인 자료, 데이터의 신뢰도를 확인할 수 있는 채용 공고 유사도를 웹을 통해 확인할 수 있다.In step S110, the data processing unit (100) refines the accuracy of data preprocessing and develops a system using the Flask framework so that users can conveniently use the model (vector model representing the similarity for each company's job posting) learned through Doc2Vec. Users can check well-organized job postings, visual data, and job posting similarity that can confirm the reliability of data through the web as a result implemented through the visualization unit (300).

S110 단계에서, 본 발명의 일 실시 에에 따르면, 상기 데이터 가공부(100)는 잡플래닛에 등록되어 있는 모든 구인공고들에 대해 크롤링을 실시하고, 잡플래닛 사이트의 JavaScript등 동적으로 구현된 부분을 크롤링하기 위해 Selenium의 웹 드라이버를 이용할 수 있다. 잡플래닛이 다른 구인구직 플랫폼보다 상대적으로 정형화된 정보를 제공하지만, 기업마다 제공하는 정보 종류의 개수가 다르기 때문에, 예외 처리를 이용하여 데이터를 수집할 수 있다.In step S110, according to one embodiment of the present invention, the data processing unit (100) crawls all job postings registered in Jobplanet and may use Selenium's web driver to crawl dynamically implemented parts such as JavaScript of the Jobplanet site. Although Jobplanet provides relatively standardized information compared to other job search platforms, the number of types of information provided by each company is different, so data can be collected using exception handling.

S110 단계에서, 상기 데이터 가공부(100)는 예외상황 발생시, 해당 페이지를 다시 크롤링할 수 있다. 데이터 크롤링 후, 기업이름, 구인공고제목, 마감일, 채용직무, 경력, 고용형태, 급여, 스킬, 기업소개, 주요업무, 자격요건, 우대사항, 채용절차, 복리후생 별로 데이터를 나누어 csv 파일로 저장할 수 있다.In step S110, the data processing unit (100) can re-crawl the page when an exceptional situation occurs. After crawling the data, the data can be divided into company name, job posting title, closing date, job position, career, employment type, salary, skills, company introduction, main job, qualification requirements, preferential treatment, hiring process, and welfare benefits and saved as a CSV file.

S110 단계에서, 상기 데이터 가공부(100)는 저장한 csv 파일에 대해 일괄적으로 전처리 작업(불용어 처리 + 성격이 비슷한 데이터는 하나의 column으로 합침)을 수행할 수 있다. 예를 들어, 필요스킬, 자격요건, 우대사항은 성격이 같기 때문에 '자격요건' column에 몰아넣을 수 있다. 예를 들어, 구인공고제목, 직무, 주요업무는 성격이 같기 때문에 'task' column에 몰아넣을 수 있다.In step S110, the data processing unit (100) can perform batch preprocessing (stop word processing + merge data with similar characteristics into one column) on the saved CSV file. For example, required skills, qualifications, and preferential conditions can be grouped into the 'qualifications' column because they have similar characteristics. For example, job posting title, job title, and main tasks can be grouped into the 'task' column because they have similar characteristics.

S110 단계에서, 상기 데이터 가공부(100)는 똑같은 구인공고라도, 마감일이 다른 경우 다른 구인공고로 인식하여 저장되기 때문에, pandas 모듈의 drop_duplicates를 이용하여 '기업명+구인공고제목'이 같은 경우, 중복 데이터를 제외할 수 있다.In step S110, the data processing unit (100) recognizes and stores the same job postings with different deadlines as different job postings, so the drop_duplicates function of the pandas module can be used to exclude duplicate data when the 'company name + job posting title' is the same.

S110 단계에서, 상기 데이터 가공부(100)는 불용어 처리시, 사용자에게 구인공고의 raw data또한 제공할 것이기 때문에, 원본상태의 구인공고 글을 따로 column을 만들어 보관할 수 있다.In step S110, since the data processing unit (100) will also provide the user with raw data of the job posting when processing stop words, the original job posting can be stored in a separate column.

S110 단계에서, 상기 데이터 가공부(100)는 csv저장시, 구인공고에 "\r" 문자가 포함된 경우, 자동으로 행이 넘어가는 상황이 발생하여 \r를 \n으로 replace 처리할 수 있다.At step S110, the data processing unit (100) can automatically replace \r with \n when the job posting contains the character "\r" when saving as a CSV file, thereby causing a situation in which a line is automatically turned over.

S110 단계에서, 상기 데이터 가공부(100)는 여러 토크나이저를 사용한 토큰화 기법들을 비교한 후, 가장 토큰화가 잘되는(문장에서 중요한 단어를 잘 추출하는) 토크나이저를 사용할 수 있다.In step S110, the data processing unit (100) can compare tokenization techniques using multiple tokenizers and then use the tokenizer that tokenizes best (extracts important words from sentences well).

Kiwi, Mecab, Kkma, Okt 등을 비교한 결과, kiwi와 Mecab 토크나이저가 가장 품사 태그를 다양하게 나누고, 학습에 적합하다.Comparing Kiwi, Mecab, Kkma, and Okt, we found that Kiwi and Mecab tokenizers divide part-of-speech tags most diversely and are suitable for learning.

S110 단계에서, 상기 데이터 가공부(100)는 불용어 처리된 데이터에서, kiwi토크나이저를 이용하여 '일반명사, 고유명사, 영어, 어근'으로 토큰화할 수 있다.In step S110, the data processing unit (100) can tokenize stopword-processed data into ‘common nouns, proper nouns, English words, and roots’ using the kiwi tokenizer.

S110 단계에서, 상기 데이터 가공부(100)는 kiwi에서 exception을 발생시키는 데이터에 대해서는 Mecab 토크나이저를 사용하여 토큰화할 수 있다At step S110, the data processing unit (100) can tokenize data that generates an exception in kiwi using the Mecab tokenizer.

S110 단계에서, 상기 데이터 가공부(100)는 개발한 토크나이저를 이용하여 csv 파일을 두 개 만들 수 있다. 예를 들어, 복리후생과 task column을 토큰화 한 csv 파일, 자격요건과 task column을 토큰화 한 csv 파일을 생성할 수 있다.In step S110, the data processing unit (100) can create two CSV files using the developed tokenizer. For example, a CSV file that tokenizes the welfare and task columns and a CSV file that tokenizes the qualifications and task columns can be created.

S110 단계에서, 상기 데이터 가공부(100)는 각 csv파일을 만들 때에 task column을 추가적으로 토큰화 하는 이유는, 학습모델을 2개 만들 것이기 때문이다. 예를 들어, 복리후생/자격요건을 학습한 모델 + task를 학습한 모델을 만들 수 있다.In step S110, the data processing unit (100) additionally tokenizes the task column when creating each CSV file because two learning models will be created. For example, a model that has learned welfare/qualification requirements + a model that has learned tasks can be created.

S110 단계에서, 상기 데이터 가공부(100)가 학습 모델을 2개 사용하는 이유는 사용자가 입력한 기업과 '복리후생/자격요건'중 하나의 기준이 비슷한 기업들을 추천할 것인데, '복리후생/자격요건'은 비슷한데 정작 채용직무가 다르다면, 해당 추천은 사용자에게 의미가 없기 때문이다.In step S110, the reason why the data processing unit (100) uses two learning models is that it will recommend companies that have similar criteria among the 'welfare/qualification requirements' and the company input by the user. However, if the 'welfare/qualification requirements' are similar but the actual job positions are different, the recommendation is meaningless to the user.

상기 데이터 가공부(100)는 토큰화 된 데이터를 Doc2Vec 모델을 이용해 학습을 수행하는데, 이 때 Doc2Vec모델의 Hyperparameter를 조정해 가며, 가장 우수한 정확도를 가지는 값을 채택할 수 있다.The above data processing unit (100) performs learning on tokenized data using a Doc2Vec model, and at this time, the hyperparameters of the Doc2Vec model can be adjusted to adopt a value with the best accuracy.

이 때 기계 학습의 방법으로 Doc2Vec을 사용한 이유는, TensorFlow를 이용한 문서 임베딩 방법과 비교해 보았을 때 속도가 더 빠르고, 여러 hyperparameter를 조정하며 학습 가능하다는 점에서 Doc2Vec이 학습에 더 유리할 것으로 판단했기 때문이다.The reason Doc2Vec was used as a machine learning method at this time is because, compared to the document embedding method using TensorFlow, it is faster and it is possible to learn by adjusting multiple hyperparameters, so Doc2Vec was judged to be more advantageous for learning.

S110 단계에서, 상기 데이터 가공부(100)는 학습한 두개의 모델 중 task를 학습한 모델에서, 사용자가 입력한 기업과 70% 이상 유사한 기업들의 index를 우선적으로 추출할 수 있다.At step S110, the data processing unit (100) can preferentially extract the index of companies that are 70% or more similar to the company input by the user from the model that learned the task among the two learned models.

S110 단계에서, 상기 데이터 가공부(100)는 직무의 유사도 하한선을 70%로 정한 이유는, 직무 모델(task column을 학습한 모델)을 학습시에 사용한 데이터의 양이 많았기 때문에(task column에 '구인공고제목, 직무, 주요업무' 데이터를 몰아 넣음), 70% 유사도를 갖더라도 실제 데이터를 비교한 경우 유사했기 때문이다.In step S110, the reason why the data processing unit (100) set the lower limit of the similarity of the job to 70% is because the amount of data used when learning the job model (the model that learned the task column) was large (data on 'job posting title, job, and main tasks' were pushed into the task column), and even if the similarity was 70%, it was similar when compared to the actual data.

S110 단계에서, 상기 데이터 가공부(100)는 직무모델에서 추출된 기업들에 대해 '복리후생/자격요건'모델로 유사한 기업들을 추출하여 유사도가 높은 순으로 출력합니다. 예를 들어, 최종적으로 직무가 70%이상 유사하면서 '복리후생/자격요건'또한 유사한 기업들이 추출될 수 있다.In step S110, the data processing unit (100) extracts similar companies from the companies extracted from the job model using the 'welfare/qualification requirements' model and outputs them in order of high similarity. For example, companies whose jobs are 70% or more similar and whose 'welfare/qualification requirements' are also similar can be extracted.

본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 단계(S120)를 포함할 수 있다.A job search matching method according to one embodiment of the present invention may include a step (S120) of learning and calculating a correlation between companies satisfying the above-mentioned company conditions and company preferences according to at least one of the above-mentioned user specifications.

S120 단계에서, 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들을 선별해 리스팅할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들의 리스트에서 상기 사용자의 스펙에 따른 기업 선호도의 상관 관계에 따라 상기 사용자가 선호할 만한 기업들을 순위에 따라 나열할 수 있다. In step S120, the learning unit (200) can learn and calculate the correlation between companies satisfying the company conditions and the company preferences according to at least one of the user specifications. The learning unit (200) can select and list companies satisfying the company conditions. The learning unit (200) can list companies that the user may prefer in order of the correlation between the company preferences according to the user specifications from the list of companies satisfying the company conditions.

S120 단계에서, 상기 학습부(200)는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭할 수 있다. 상기 학습부(200)는 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습할 수 있다. 상기 학습부(200)는 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산할 수 있다. In step S120, the learning unit (200) can match at least one user specification with a corporate preference. The learning unit (200) can learn by defining the correlation between the change in the user specification and the change in the corporate preference as a probability. The learning unit (200) can calculate the correlation between the corporate condition and the corporate preference according to at least one user specification according to the probability.

S120 단계에서, 상기 학습부(200)는 상기 기상 상태 속성의 변동과 상기 대상의 상태의 변동의 상관 관계를 확률로 정의하여 학습하여 상관 관계를 측정할 수 있다. In step S120, the learning unit (200) can learn the correlation between the change in the weather condition attribute and the change in the condition of the target by defining the correlation as a probability and measure the correlation.

S120 단계에서, 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 영향력을 연산할 수 있다. At step S120, the learning unit (200) can calculate the influence of corporate preference according to at least one of the user specifications and the corporate conditions that satisfy the corporate conditions.

S120 단계에서, 상기 학습부(200)는 상기 영향력은 상기 기업 선호도에 상기 사용자 스펙의 변화에 따른 영향이 얼마나 있는지에 따라 달라지는 값으로 다음의 수학식 1에 의해 연산될 수 있다.At step S120, the learning unit (200) calculates the influence as a value that varies depending on how much the change in the user specifications affects the corporate preference, using the following mathematical expression 1.

[수학식 1][Mathematical formula 1]

S120 단계에서, 상기 학습부(200)에서 연산부하는 상기 영향력의 기대값(평균)을 나타내는 것으로써, 연산부하가 크다는 것은 평균 영향력이 크다는 것이며 불확실성이 크면 클수록 분류하기는 어려워지기 때문에 연산부하가 가장 작은 것을 상위 의사 결정 노드에 위치시켜 연산량을 감소시킬 수 있다. 상기 연산부하는 하기 수학식 2를 연산해 구할 수 있다.In step S120, the computational load in the learning unit (200) represents the expected value (average) of the influence. A larger computational load means a larger average influence, and the greater the uncertainty, the more difficult it is to classify. Therefore, the computational amount can be reduced by placing the smallest computational load at the upper decision-making node. The computational load can be obtained by calculating the following mathematical expression 2.

[수학식 2][Mathematical formula 2]

S120 단계에서, 상기 학습부(200)에서 상기 상관 관계는 하기 수학식 3을 연산해 구할 수 있다.At step S120, the correlation can be obtained by calculating the following mathematical expression 3 in the learning unit (200).

[수학식 3][Mathematical Formula 3]

본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 단계(S130)를 포함할 수 있다.A job search matching method according to one embodiment of the present invention may include a step (S130) of determining the ranking of companies satisfying the company conditions according to the correlation.

S130 단계에서, 상기 학습부(200)는 상기 상관 관계에 따른 상기 사용자 스펙의 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 순서대로 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 상기 사용자 스펙을 추천 우선순위에 둘 수 있다. In step S130, the learning unit (200) can determine the ranking of the user specifications according to the correlation. The learning unit (200) can determine the ranking in order of decreasing correlation. The learning unit (200) can place the user specifications with low correlation in the recommendation priority.

S130 단계에서, 상기 학습부(200)는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정할 수 있다.At step S130, the learning unit (200) can determine a ranking including corporate preferences based on a combination of multiple user specifications.

본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 단계(S140)를 포함할 수 있다.A job search matching method according to one embodiment of the present invention may include a step (S140) of presenting a list of companies to the user according to the company ranking according to the correlation.

S140 단계에서, 상기 시각화부(300)는 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시할 수 있다.At step S140, the visualization unit (300) can present a list of companies to the user according to the company ranking according to the correlation.

S140 단계에서, 상기 시각화부(300)는 복수의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. 상기 시각화부(300)는 상기 사용자 스펙들의 복수의 조합에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. In step S140, the visualization unit (300) can provide a result according to the ranking of the correlation of corporate preferences according to a plurality of the user specifications. The visualization unit (300) can provide a result according to the ranking of the correlation of corporate preferences according to a plurality of combinations of the user specifications.

S140 단계에서, 상기 시각화부(300)는 사용자가 유사한 기업을 추천 받고 추천 받은 기업 공고의 정보를 확인 할 때, 시각적으로 어떤 부분이 유사했는지 확인하기 쉽도록 데이터를 시각화하여 사용자에게 제공할 수 있다. 상기 시각화부(300)는 각각의 기업 공고 내용에 자주 등장하는 핵심 단어를 다양하게 시각화하는 Wordcloud 방식을 사용할 수 있다. 상기 시각화부(300)는 additional web page에 유사한 기업들을 2차원 좌표평면 상에 TSNE(고차원의 벡터 데이터의 차원을 축소 하는 기법)를 이용해 보여줌으로써, 사용자에게 직관적이고 흥미로운 방법으로 유사한 기업을 탐색해볼 수 있는 기능도 제공할 수 있다.In step S140, the visualization unit (300) can provide data as a visualization so that the user can easily check which parts are visually similar when the user receives a recommendation for a similar company and checks the information of the recommended company announcement. The visualization unit (300) can use a Wordcloud method that variously visualizes key words that frequently appear in the content of each company announcement. The visualization unit (300) can also provide the user with a function to search for similar companies in an intuitive and interesting way by showing similar companies on an additional web page on a two-dimensional coordinate plane using TSNE (a technique for reducing the dimension of high-dimensional vector data).

S140 단계에서, 상기 시각화부(300)는 앞서 '데이터 임베딩' 단계까지 개발한 시스템을 Flask를 이용한 웹 서버에 탑재해 출력할 수 있다. Backend와 FrontEnd간의 데이터 전송은, back-end에서는 html파일을 클라이언트에게 보여주는 render_template 함수를 호출 시 parameter로 front-end로 데이터를 전달하는 방법으로 수행된다. Back-end에서 전달받은 데이터를, Front-end에서는 JavaScript와 jQuery를 이용하여 html상에 배치한 컴포넌트들에 나타나도록 할 수 있다. Front-end에서 웹 서버로의 데이터 전달은 form태그의 get method를 이용하거나, front-end에서 호출한 URL에 data를 포함하여 전달 후 back-end에서 처리하여 이용하는 방식으로 구현할 수 있다.In step S140, the visualization unit (300) can output the system developed up to the 'data embedding' step by loading it on a web server using Flask. Data transmission between the Backend and the Frontend is performed by transmitting data to the frontend as a parameter when the backend calls the render_template function that shows an html file to the client. The data received from the Backend can be displayed in components placed on the html using JavaScript and jQuery in the Frontend. Data transmission from the Frontend to the web server can be implemented by using the get method of the form tag, or by including data in the URL called from the frontend and transmitting it, and then processing and using it in the backend.

시각화 정보를 보여주는 html파일의 경우, 웹 브라우저에 캐싱된 데이터가 아닌 새로 바뀐 정보를 제공해야 하기 때문에, Cache-Control를 이용하여 캐싱을 하지 않도록 막고, 시각화 이미지를 부를 시에 URL에 랜덤한 숫자를 추가하여 서버에 전송 함으로써, 같은 이름의 이미지 파일에 다른 시각화 정보를 덮어씌워 사용하더라도 바뀐 이미지가 사용자에게 제공하도록 구현할 수 있다.In the case of html files that display visualization information, since newly changed information must be provided, not data cached in the web browser, caching can be prevented using Cache-Control, and when calling a visualization image, a random number is added to the URL and transmitted to the server, so that even if an image file with the same name is overwritten with other visualization information, the changed image can be provided to the user.

시작페이지: hello.html,Start page: hello.html,

로딩페이지: index.html,Loading page: index.html,

분석결과페이지: result.html,Analysis results page: result.html,

사용자는 apply.html에 잡플래닛 구인공고 링크를 입력한 뒤, 어떤 기준(복리후생 or 자격요건)에 따라 유사한 기업을 추천 받을 것인지 입력하게 된다.Users enter the Jobplanet job posting link in apply.html, and then enter the criteria (welfare or qualification requirements) for which they would like to receive recommendations for similar companies.

S140 단계에서, 상기 시각화부(300)는 웹 서버에서는 사용자가 입력한 링크와 기준에 따라, 학습된 모델에서 유사한 기업을 추출하여result.html에서 table을 통해 표 형식으로 사용자에게 제공한다.At step S140, the visualization unit (300) extracts similar companies from the learned model based on the links and criteria entered by the user on the web server and provides them to the user in a table format through result.html.

S140 단계에서, 상기 시각화부(300)는 사용자는 result.html의 각 기업의 [자세히보기]를 클릭하여 해당 기업 구인공고의 키워드를 분석한 WordCloud 결과와, 전처리 되지 않은 raw 구인공고 내용을 제공받는다.At step S140, the visualization unit (300) provides the user with a WordCloud result that analyzes keywords of the job posting of the corresponding company and the raw job posting content that has not been preprocessed by clicking [View Details] of each company in result.html.

S140 단계에서, 상기 시각화부(300)는 사용자가 입력한 링크에 '복리후생'에 대한 정보가 없는 상태에서 복리후생으로 추천을 명령한 경우, 시스템은 nobokri.html을 보여줌으로써 사용자에게 링크를 다시 입력할 것을 사용자에게 요구한다.At step S140, if the visualization unit (300) orders a recommendation for welfare benefits when there is no information about ‘welfare benefits’ in the link entered by the user, the system requests the user to re-enter the link by showing nobokri.html.

S140 단계에서, 상기 시각화부(300)는 사용자가 입력한 링크의 기업과 유사한 기업이 없는 경우(직무와 기준이 모두 유사한 기업이 없는 경우), 시스템은 nolist.html을 보여줌으로써 유사 기업이 없음을 알리고, 사용자에게 '입력 받은 기준'이 유사하지는 않지만, '직무'가 유사한 다른 기업들을 추천해준다.At step S140, if there is no company similar to the company of the link input by the user (if there is no company with similar job and criteria), the system displays nolist.html to inform the user that there is no similar company, and recommends other companies that are not similar in 'input criteria' but have similar 'jobs'.

S140 단계에서, 상기 시각화부(300)는 result.html에서 사용자는 시각화 정보를 보여주는 display_plot.html로 넘어갈 수 있다.At step S140, the visualization part (300) allows the user to move from result.html to display_plot.html, which shows visualization information.

S140 단계에서, 상기 시각화부(300)는 display_plot에서는 서버로부터 받은 TSNE좌표 데이터로, Plotly.js를 이용하여 html상에 scatter plot을 그린다.At step S140, the visualization unit (300) draws a scatter plot on HTML using Plotly.js with TSNE coordinate data received from the server in display_plot.

S140 단계에서, 상기 시각화부(300)는 이 때 2개의 plot을 보여주게 되는데, 사용자 입력 기업과 '사용자가 선택한 기준'이 비슷한 기업들을 시각화 한 scatter plot과 '직무'가 비슷한 기업들을 시각화 한 scatter plot을 보여준다.At step S140, the visualization unit (300) displays two plots at this time: a scatter plot visualizing companies with similar user input companies and ‘user-selected criteria’ and a scatter plot visualizing companies with similar ‘jobs’.

S140 단계에서, 상기 시각화부(300)는 linktoimg.html에서는 WordCloud 결과와 raw 구인공고 내용 뿐만 아니라, 사용자로 하여금 시스템이 추천한 데이터의 신뢰도를 직접 확인할 수 있도록, 각 기준에 따른 '유사도'또한 제공하게 된다.At step S140, the visualization unit (300) provides not only the WordCloud results and raw job posting content in linktoimg.html, but also the ‘similarity’ according to each criterion so that the user can directly check the reliability of the data recommended by the system.

S140 단계에서, 상기 시각화부(300)는 사용자에게 시각화된 정보를 제공하기 위해, TSNE 모듈을 사용하여 데이터를 2차원으로 축소 시킨 뒤, 2차원 평면에 시각화 하여 제공한다.At step S140, the visualization unit (300) reduces the data to two dimensions using the TSNE module to provide visualized information to the user, and then provides it by visualizing it on a two-dimensional plane.

S140 단계에서, 상기 시각화부(300)는 이때 모든 기업의 정보를 시각화 하는 것이 아닌, 사용자 입력 기업으로부터 일정 거리만큼 떨어진 기업들(가장 유사한 기업들)만 시각화 하여 제공하게 된다.At step S140, the visualization unit (300) does not visualize information on all companies, but only visualizes and provides information on companies (the most similar companies) that are a certain distance away from the user input company.

S140 단계에서, 상기 시각화부(300)는 시각화된 정보들 중, 임의의 두 marker 간의 위치 차이가 매우 극소하여 하나의 marker처럼 보이는 현상을 방지하기 위해, 임의의 두 marker사이의 거리가 극소할 경우, 둘 중 하나의 위치를 조금 변경하여 겹치지 않도록 시각화를 수행한다.In step S140, the visualization unit (300) performs visualization so that, when the distance between two arbitrary markers is extremely small, the position difference between the two markers is so small that they appear as one marker, by slightly changing the position of one of the two markers so that they do not overlap.

S140 단계에서, 상기 시각화부(300)는 토큰화된 복리후생/자격요건 데이터들을 키워드로 WordCloud 모듈을 만들어 제공한다.In step S140, the visualization unit (300) creates a WordCloud module using tokenized welfare/qualification data as keywords and provides it.

이 때, 사용자가 유사도 판단의 기준으로 '복리후생'을 선택한 경우 '복리후생'토큰화 데이터로 만든 WordCloud 결과를 보여주고, '자격요건'을 선택한 경우, '자격요건' 토큰화 데이터로 만든 WordCloud 결과를 보여준다.At this time, if the user selects ‘welfare’ as the criterion for judging similarity, the WordCloud result created with ‘welfare’ tokenized data is shown, and if the user selects ‘qualification’, the WordCloud result created with ‘qualification’ tokenized data is shown.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to preferred embodiments thereof. Those skilled in the art will appreciate that the present invention may be implemented in modified forms without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative rather than a restrictive perspective. The scope of the present invention is indicated by the claims, not the foregoing description, and all differences within the scope equivalent thereto should be interpreted as being included in the present invention.

Claims

A step for receiving at least one desired corporate condition and user specification from the user;
A step of learning and calculating a correlation between companies satisfying the above business conditions and company preferences according to at least one of the above user specifications;
A step of determining the ranking of companies satisfying the above corporate conditions according to the above correlation; and
Including a step of presenting a list of companies to the user according to the ranking of the companies according to the correlation,
The step of entering at least one of the above business conditions and user specifications is:
We embed job postings from multiple companies that include the company whose link the user entered into a 200-dimensional embedding vector.
The steps presented to the above user are:
After reducing the above embedding vector to two dimensions, only the company of the link entered by the user and the companies located a certain distance away from the company of the link entered by the user are visualized as points on a two-dimensional coordinate plane - the company of the link entered by the user is expressed as a red dot on the coordinate plane, and the companies located a certain distance away are expressed as blue dots -
A job search matching method that provides the user with, when the user clicks on any point on the two-dimensional coordinate plane, the similarity between the job posting content of the company corresponding to the point and the company of the link entered by the user - the similarity being the result of a word cloud created using tokenized data of one of the welfare benefits or qualification requirements of the job posting.

In the first paragraph,
The above correlation operation step is,
A step of matching at least one user specification with a corporate preference;
A step of learning by defining the correlation between the change in the above user specifications and the change in the above corporate preference as a probability; and
A job search matching method characterized by including a step of calculating a correlation between the corporate conditions and corporate preferences according to at least one user specification according to the above probability.

In the second paragraph,
The above correlation operation step is,
A job search and recruitment matching method, characterized in that the above correlation is calculated by the following mathematical formula 1.
[Mathematical Formula 1]

(Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the corporate preference.)

In the first paragraph,
The above user specifications are,
A job search matching method characterized by including at least one of school, major, gender, age, region, and MBTI.

In paragraph 4,
The steps to determine the ranking of the above companies are:
Determine a ranking that includes corporate preferences based on a combination of multiple above user specifications,
The above user presentation steps are:
A job search matching method characterized by providing results according to the above ranking.

Data processing unit that receives at least one desired corporate condition and user specification from the user;
A learning unit that learns and calculates correlations between companies satisfying the above business conditions and company preferences according to at least one of the above user specifications;
A ranking unit that determines the ranking of companies that satisfy the above corporate conditions according to the above correlation; and
Including a visualization section that presents a list of companies to the user according to the ranking of companies according to the correlation;
The above data processing unit,
We embed job postings from multiple companies that include the company whose link the user entered into a 200-dimensional embedding vector.
The above visualization section,
After reducing the above embedding vector to two dimensions, only the company of the link entered by the user and the companies located a certain distance away from the company of the link entered by the user are visualized as points on a two-dimensional coordinate plane - the company of the link entered by the user is expressed as a red dot on the coordinate plane, and the companies located a certain distance away are expressed as blue dots -
A job search matching system characterized in that, when the user clicks on any point on the two-dimensional coordinate plane, the similarity between the job posting content of the company corresponding to the point and the company of the link entered by the user is provided to the user - the similarity being a word cloud result created using tokenized data of one of the welfare benefits or qualification requirements of the job posting.

In Article 6,
The above learning department,
A job search matching system characterized by matching at least one user specification and a corporate preference, defining a correlation between a change in the user specification and a change in the corporate preference as a probability and learning it, and calculating a correlation between the corporate condition and the corporate preference according to at least one user specification according to the probability.

In Article 7,
The above learning department,
A job search matching system characterized in that the above correlation is calculated by the following mathematical formula 1.
[Mathematical formula 1]

(Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the corporate preference.)

In Article 8,
The above user specifications are,
A job search matching system characterized by including at least one of school, major, gender, age, region, and MBTI.

In Article 9,
The above learning department,
Determine a ranking that includes corporate preferences based on a combination of multiple above user specifications,
The above visualization section,
A job search matching system characterized by providing results according to the above ranking.