TWI854521B

TWI854521B - Attention list collating and analyzing method and system

Info

Publication number: TWI854521B
Application number: TW112108568A
Authority: TW
Inventors: 蔡佩玲; 歐曜瑋; 王怡蘋
Original assignee: 富邦金融控股股份有限公司
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2024-09-01
Also published as: TW202437135A

Abstract

An attention list collating and analyzing method, configured to analyze and compare news articles from at least one news media source to generate an attention list, comprises the following steps of: a news collection module collecting news articles from the at least one news media source; a concerned subject comparison module identifying whether the news articles have a concerned subject and a concerned issue, and capturing at least one keyword from news articles having the concerned subject and the concerned issue; a duplicate information comparison module comparing the received news articles having the concerned subject and the concerned issue and the at least one keyword to the destination file respectively, and updating the destination file according to the result of the comparison between each news article and the destination file; and, an integration module generates the attention list and related information according to the destination file.

Description

Focus list compilation and analysis method and system

本發明係關於一種資訊分析方法及系統，尤其是一種針對新聞媒體資訊源中出現的人物進行分析的關注名單整理分析方法及系統。 The present invention relates to an information analysis method and system, in particular to a method and system for compiling and analyzing a follow list for analyzing characters appearing in news media information sources.

隨著經濟全球化以及資訊技術的迅速發展，洗錢及詐欺等金融犯罪相關案件逐年增加，並且逐漸走向組織化以及專業化，各銀行為了因應洗錢防制計畫以降低洗錢及詐欺件數，開始加強認識你的客戶/客戶盡職審查(Know Your Customer/Customer Due Diligence,KYC/CDD)的流程。KYC/CDD是銀行認識每位客戶的流程，泛指客戶最初至金融相關機構開戶或進行商務協議簽署前，銀行透過收集關於客戶的文件及網路資料庫搜尋，以了解客戶的身份、背景、收入資金來源、財政狀況以及是否有被制裁與相關負面新聞等等。換言之，KYC/CDD主要功能在於金融相關機構針對客戶提交的文件上得到的資訊而作出的風險評估，並可依據風險評估結果來判斷是否該通過客戶的開戶申請或是商務協議。當客戶被歸納為高風險類別，可拒絕其開戶、駁回其商務協議、或進行加強審查。若經審查後，金融相關機構通過客戶的開戶申請或是商務協議，但仍屬可疑交易高風險者，則可能將會限制其日後的金融交易行為並且定期追蹤其金融交易業務往來以及搜尋與其相關之負面新聞。 With the globalization of the economy and the rapid development of information technology, financial crime cases such as money laundering and fraud have increased year by year and are gradually becoming more organized and professional. In order to respond to the anti-money laundering plan and reduce the number of money laundering and fraud cases, banks have begun to strengthen the Know Your Customer/Customer Due Diligence (KYC/CDD) process. KYC/CDD is the process by which banks get to know each customer. It generally refers to the process by which banks collect documents about customers and search online databases before customers first open an account with financial institutions or sign a business agreement to understand the customer's identity, background, source of income and funds, financial status, and whether they have been sanctioned and have related negative news, etc. In other words, the main function of KYC/CDD is that financial institutions conduct risk assessments based on the information obtained from the documents submitted by customers, and can determine whether to approve the customer's account opening application or business agreement based on the risk assessment results. When a customer is classified as a high-risk category, the account opening may be refused, the business agreement may be rejected, or enhanced review may be conducted. If after review, the financial institution approves the customer's account opening application or business agreement, but it is still a high-risk person for suspicious transactions, it may restrict its future financial transactions and regularly track its financial transactions and search for negative news related to it.

上述針對客戶追蹤其相關新聞的方法，一般為直接在網際網路上搜尋，特別是在提供大量新聞的新聞網站上較可能會有前述可疑交易高風險客戶的新聞。然而，由於現今網路上的資訊量非常龐大，在網路平台輸入關鍵字以搜尋與前述可疑交易高風險客戶相關之負面新聞時，經常得到過於龐雜的資料，而不容易從其中快速精準地取得重要且符合搜尋需求的資訊，且搜尋結果通常伴隨與欲查詢資訊相關性極低的廣告文宣資訊。 The above-mentioned method of tracking relevant news about customers is generally to search directly on the Internet, especially on news websites that provide a large amount of news, which are more likely to have news about the aforementioned high-risk customers with suspicious transactions. However, due to the huge amount of information on the Internet today, when entering keywords on the Internet platform to search for negative news related to the aforementioned high-risk customers with suspicious transactions, too much complex data is often obtained, and it is not easy to quickly and accurately obtain important information that meets the search needs, and the search results are usually accompanied by advertising information that has very little relevance to the information to be queried.

上述輸入關鍵字搜尋出的新聞結果，目前係透過人工進行比對、篩選，以及剔除重複來得到所需要的資訊。上述的比對、篩選以及剔除重複的流程，包含：(一)當相同事件存在多筆新聞文章以及媒體資料時，欲關注之新聞文章內容的重複性高，但每一筆資料內容具些微差異，因此需要進行資料比對以篩選出與同一新聞事件相關之名單，並剔除重複資料；(二)新聞媒體報導、新聞文章以及資料中出現的人名並非皆符合與搜尋目標有關之新聞人士，因此需要檢視後剔除；(三)新一期名單與前期名單交錯重疊，相同人名可能反覆出現，因此必須判讀每一筆資料是否屬於同一新聞事件；(四)同一新聞事件隨時間發展，相同人名的狀態、身份可能發生變化，例如：原先僅為嫌疑人，後來經起訴、判刑或是定罪而成為罪犯，因此需額外經蒐集訴訟資訊，標記符合預設條件之特定負面新聞人士；(五)當涉及負面新聞之主體為非自然人時，該非自然人的狀態是否仍然有效存續，是否已經解散、重整，需額外經查證相關的公開資訊管道平台。上述的流程相當繁雜，僅仰賴人工進行判讀與反覆比對，極可能會有較高的出錯率且耗費較多時間。 The news results obtained by searching for keywords mentioned above are currently obtained through manual comparison, screening, and elimination of duplicates to obtain the required information. The above-mentioned comparison, screening, and elimination of duplicate processes include: (i) When there are multiple news articles and media data about the same event, the content of the news articles you want to focus on is highly repetitive, but the content of each piece of data is slightly different, so it is necessary to perform data comparison to screen out the list related to the same news event and eliminate duplicate data; (ii) The names of people appearing in news media reports, news articles, and data do not all match the news figures related to the search target, so they need to be reviewed and eliminated; (iii) The new list overlaps with the previous list, and the same name may be It can appear repeatedly, so it is necessary to judge whether each piece of data belongs to the same news event; (iv) As the same news event develops over time, the status and identity of the same person may change. For example, a person who was originally only a suspect may become a criminal after being prosecuted, sentenced or convicted. Therefore, it is necessary to collect litigation information and mark specific negative news persons who meet the preset conditions; (v) When the subject of negative news is a non-natural person, whether the status of the non-natural person is still valid and whether it has been dissolved or reorganized needs to be verified through relevant public information channels and platforms. The above process is quite complicated. If it relies solely on manual judgment and repeated comparison, it is very likely to have a high error rate and consume a lot of time.

因此，有必要研發一種能夠從網路搜尋結果快速取得有用且所需之新聞媒體資料的資訊分析系統，以解決先前技術之問題。 Therefore, it is necessary to develop an information analysis system that can quickly obtain useful and required news media data from Internet search results to solve the problems of previous technologies.

有鑑於此，本發明提供一種關注名單整理分析方法及系統，藉以解決以上所述的習知問題。 In view of this, the present invention provides a method and system for sorting and analyzing a list of concerns to solve the above-mentioned problems.

根據本發明之一具體實施例，關注名單整理分析方法係用於分析及比對至少一新聞媒體資訊源的複數個新聞文章以產生關注人員名單及其相關資訊，其包含以下步驟：新聞收集模組搜集至少一新聞媒體資訊源的新聞文章；關注主體比對模組運用自然語言技術分別針對新聞文章辨識是否具有關注主體，再分別針對具有關注主體之新聞文章辨識其內容是否符合關注議題，並且將具有關注主體且符合關注議題之新聞文章擷取至少一關鍵字；重複資訊比對模組自關注主體比對模組接收具有關注主體且符合關注議題之每一新聞文章及對應的至少一關鍵字，並將接收到的新聞文章分別比對目的檔案，且根據各新聞文章與目的檔案的比對結果更新目的檔案；以及，統整模組根據目的檔案產出關注人員名單及其相關資訊。 According to a specific embodiment of the present invention, the method for sorting and analyzing the follow list is used to analyze and compare a plurality of news articles from at least one news media information source to generate a follow list and related information, which includes the following steps: a news collection module collects news articles from at least one news media information source; a follow subject comparison module uses natural language technology to identify whether the news articles have a follow subject, and then identifies whether the content of the news articles with the follow subject meets the requirements of the news content. Focus on the topic, and extract at least one keyword from the news articles with the focus subject and matching the focus topic; the duplicate information matching module receives each news article with the focus subject and matching the focus topic and the corresponding at least one keyword from the focus subject matching module, and matches the received news articles with the target file respectively, and updates the target file according to the comparison results of each news article and the target file; and the integration module generates a list of focus personnel and related information according to the target file.

其中，關注名單整理分析方法進一步包含以下步驟：自動派送模組將目的檔案產出之關注人員名單及其相關資訊以一訊息通知方式向外發送。 Among them, the method for sorting and analyzing the follow list further includes the following steps: the automatic delivery module sends the follow person list and related information generated by the target file to the outside in the form of a message notification.

其中，使用者藉由使用者介面輸入關注主體並設定關注議題。 Among them, users input the subject of attention and set the topic of attention through the user interface.

其中，關注主體比對模組透過主體比對模型將新聞文章與交叉比對資訊源進行關注主體比對，主體比對模型係透過機器學習演算法產生；其中交叉比對資訊源進一步包含商業資料庫、加值資訊資料庫以及查詢數位足跡之網站。 Among them, the subject comparison module compares the news articles with the cross-comparison information sources through the subject comparison model, and the subject comparison model is generated through the machine learning algorithm; the cross-comparison information sources further include business databases, value-added information databases, and websites for querying digital footprints.

其中，重複資訊比對模組根據各新聞文章與目的檔案的比對結果，進一步包含以下步驟：當新聞文章中之第一新聞文章與目的檔案的比對結果為關注主體重複且並沒有資訊差異時，重複資訊比對模組刪除第一新聞文章；當第一新聞文章與目的檔案的比對結果為關注主體重複且並有一資訊變化時，重複資訊比對模組將資訊變化加入目的檔案；當第一新聞文章與目的檔案的比對結果為關注主體不重複時，重複資訊比對模組將第一新聞文章加入目的檔案。 Among them, the duplicate information comparison module further includes the following steps according to the comparison results of each news article and the target file: when the comparison result of the first news article in the news article and the target file is that the focus subject is repeated and there is no information difference, the duplicate information comparison module deletes the first news article; when the comparison result of the first news article and the target file is that the focus subject is repeated and there is an information change, the duplicate information comparison module adds the information change to the target file; when the comparison result of the first news article and the target file is that the focus subject is not repeated, the duplicate information comparison module adds the first news article to the target file.

其中，關注名單整理分析方法進一步包含以下步驟：重複資訊比對模組透過資訊比對模型接收到的新聞文章分別比對目的檔案，其中資訊比對模型係透過監督式機器學習演算法產生。 Among them, the attention list sorting and analysis method further includes the following steps: the repeated information comparison module compares the target files respectively with the news articles received through the information comparison model, wherein the information comparison model is generated through a supervised machine learning algorithm.

本發明之另一範疇在於提供一種關注名單整理分析系統。根據另一具體實施例，關注名單整理分析系統係用以分析至少一新聞媒體資訊源的複數個新聞文章以產生關注人員名單及其相關資訊，其包含新聞收集模組、關注主體比對模組、重複資訊比對模組以及統整模組。新聞收集模組用以連接至少一新聞媒體資訊源以搜集至少一新聞媒體資訊源的新聞文章。關注主體比對模組耦接新聞收集模組以接收新聞文章。關注主體比對模組係用以透過自然語言技術分別針對新聞文章辨識是否具有一關注主體，再分別針對具有關注主體之新聞文章辨識其內容是否符合關注議題，並且將具有關注主體且符合關注議題之新聞文章擷取至少一關鍵字。重複資訊比對模組耦接關注主體比對模組。重複資訊比對模組係用以自關注主體比對模組接收具有關注主體且符合關注議題之每一新聞文章及對應的至少一關鍵字，並將接收到的新聞文章分別比對一目的檔案，且根據各新聞文章與目的檔案的比對結果更新目的檔案。統整模組耦接重複資訊比對模組。統整模組係用以根據目的檔案產出關注人員名單及其相關資訊。 Another scope of the present invention is to provide a follow list sorting and analysis system. According to another specific embodiment, the follow list sorting and analysis system is used to analyze a plurality of news articles from at least one news media information source to generate a follow list and related information thereof, and includes a news collection module, a follow subject comparison module, a duplicate information comparison module, and an integration module. The news collection module is used to connect to at least one news media information source to collect news articles from at least one news media information source. The follow subject comparison module is coupled to the news collection module to receive news articles. The subject comparison module is used to identify whether a news article has a subject of interest through natural language technology, and then identify whether the content of the news article with the subject of interest meets the topic of interest, and extract at least one keyword from the news article with the subject of interest and meeting the topic of interest. The duplicate information comparison module is coupled to the subject of interest comparison module. The duplicate information comparison module is used to receive each news article with the subject of interest and meeting the topic of interest and the corresponding at least one keyword from the subject of interest comparison module, and compare the received news articles with a target file, and update the target file according to the comparison results of each news article and the target file. The integration module is coupled to the duplicate information comparison module. The integration module is used to generate a list of concerned personnel and their related information based on the target file.

其中，關注名單整理分析系統進一步包含自動派送模組。自動派送模組耦接統整模組，用以將目的檔案產出之關注人員名單及其相關資訊，以訊息通知方式向外發送。 Among them, the follow list sorting and analysis system further includes an automatic delivery module. The automatic delivery module is coupled with the integration module to send the follow list and related information generated by the target file to the outside in the form of message notification.

其中，關注主體比對模組耦接交叉比對資訊源。關注主體比對模組將新聞文章與交叉比對資訊源進行關注主體比對。交叉比對資訊源進一步包含商業資料庫、加值資訊資料庫以及查詢數位足跡之網站。 Among them, the subject comparison module is coupled to the cross-comparison information source. The subject comparison module compares the news article with the cross-comparison information source. The cross-comparison information source further includes a business database, a value-added information database, and a website for querying digital footprints.

其中，關注主體比對模組包含主體比對模型。重複資訊比對模組包含資訊比對模型。其中主體比對模型以及資訊比對模型係透過監督式機器學習演算法產生。主體比對模型用以將新聞文章與交叉比對資訊源進行關注主體比對，且資訊比對模型用以將接收到的新聞文章分別比對目的檔案。 The subject comparison module includes a subject comparison model. The duplicate information comparison module includes an information comparison model. The subject comparison model and the information comparison model are generated by a supervised machine learning algorithm. The subject comparison model is used to compare the news articles with the cross-comparison information source, and the information comparison model is used to compare the received news articles with the target files respectively.

綜上所述，本發明提供一種關注名單整理分析方法及系統，藉由關注主體比對模組針對關注主體以及關注議題篩選出符合使用者所關注的新聞文章，再藉由重複資訊比對模組將篩選出的新聞文章，與目的檔案做比對，並根據比對結果將目的檔案做更新。本發明透過關注名單整理分析方法及系統可以快速從網路搜尋結果取得有用且所需之新聞媒體資料的資訊。此外，藉由導入人工智慧方法，透過監督式機器學習以降低關注人員名單的錯判率、重複率及排除無效資訊，使降低人為出錯率以及減少人工判讀所耗費的時間，進而提高效率並降低時間及人力成本。此外，使用者透過自動派送模組，以及時或定期收到最新之關注人員名單及其相關資訊資訊，使用者不需額外花費時間重新進行搜尋，進而達到減少人力及時間成本，並提高整體搜尋檢索的效率。 In summary, the present invention provides a method and system for sorting and analyzing a follow list, wherein a follow subject comparison module selects news articles that match the user's follow subject and follow topic, and then a duplicate information comparison module compares the selected news articles with a target file, and updates the target file according to the comparison result. The present invention can quickly obtain useful and required news media data information from network search results through a follow list sorting and analysis method and system. In addition, by introducing artificial intelligence methods, supervised machine learning is used to reduce the error rate and duplication rate of the follow list and exclude invalid information, thereby reducing the human error rate and the time spent on manual interpretation, thereby improving efficiency and reducing time and labor costs. In addition, users can receive the latest list of people to follow and their related information in a timely or regular manner through the automatic delivery module, so users do not need to spend extra time searching again, thereby reducing manpower and time costs and improving the overall search and retrieval efficiency.

圖1係繪示根據本發明之一具體實施例之關注名單整理分析系統的功能方塊圖。 FIG1 is a functional block diagram of a focus list sorting and analysis system according to one specific embodiment of the present invention.

圖2係繪示根據本發明之一具體實施例之關注名單整理分析方法的步驟流程圖。 FIG2 is a flowchart showing the steps of a method for sorting and analyzing a list of concerns according to a specific embodiment of the present invention.

圖3係繪示根據本發明之另一具體實施例之關注名單整理分析方法的步驟流程圖。 FIG3 is a flowchart showing the steps of a method for sorting and analyzing a list of concerns according to another specific embodiment of the present invention.

圖4係繪示根據本發明之又一具體實施例之關注名單整理分析方法的步驟流程圖。 FIG4 is a flowchart showing a method for sorting and analyzing a list of concerns according to another specific embodiment of the present invention.

為了讓本發明的優點，精神與特徵可以更容易且明確地了解，後續將以具體實施例並參照所附圖式進行詳述與討論。值得注意的是，這些具體實施例僅為本發明代表性的具體實施例，其中所舉例的特定方法、裝置、條件、材質等並非用以限定本發明或對應的具體實施例。又，圖中各裝置僅係用於表達其相對位置且未按其實際比例繪述，合先敘明。 In order to make the advantages, spirit and features of the present invention easier and clearer to understand, the following will be described and discussed in detail with reference to the attached drawings using specific embodiments. It is worth noting that these specific embodiments are only representative specific embodiments of the present invention, and the specific methods, devices, conditions, materials, etc. cited therein are not used to limit the present invention or the corresponding specific embodiments. In addition, the devices in the figure are only used to express their relative positions and are not drawn according to their actual proportions, which should be explained in advance.

請參閱圖1。圖1係繪示根據本發明之一具體實施例之關注名單整理分析系統1的功能方塊圖。本具體實施例提供一種關注名單整理分析方法以及系統，用於分析及比對新聞媒體資訊源的新聞文章以產生關注人員名單及其相關資訊。如圖1所示，在本具體實施例中，關注名單整理分析系統1包含新聞收集模組11、關注主體比對模組12、重複資訊比對模組13以及統整模組14。新聞收集模組11用以耦接外部的至少一新聞媒體資訊源21。關注主體比對模組12耦接新聞收集模組11，重複資訊比對模組13耦接該關注主體比對模組12，統整模組14耦接該重複資訊比對模組13。此外，本具體實施例中的關注主體比對模組12，進一步耦接外部的交叉比對資訊源22。關注主體比對模組12包含主體比對模型121；重複資訊比對模組13包含資訊比對模型131。透過關注名單整理分析系統1，可從新聞媒體資訊源21蒐集新聞文章並進行分析，以從該些新聞文章中取得想要關注的人員或客戶的相關資訊，以作為監控高風險客戶的依據，甚至可以達到洗錢或詐欺防制的效果。以下將配合關注名單整理分析方法說明本具體實施例之關注名單整理分析系統1之中各模組的功能。於實務中，關注名單整理分析系統1的新聞收集模組可為電腦系統中的網路模組，以連接外部的新聞媒體資訊源；關注主體比對模組、重複資訊比對模組及統整模組等可整合於電腦系統或雲端系統的中央處理單元，或整合於整合式晶片中。 Please refer to FIG1. FIG1 is a functional block diagram of a follow list sorting and analysis system 1 according to a specific embodiment of the present invention. The specific embodiment provides a follow list sorting and analysis method and system for analyzing and comparing news articles of news media information sources to generate a follow list and related information. As shown in FIG1, in the specific embodiment, the follow list sorting and analysis system 1 includes a news collection module 11, a follow subject comparison module 12, a duplicate information comparison module 13 and an integration module 14. The news collection module 11 is used to couple at least one external news media information source 21. The follow subject comparison module 12 is coupled to the news collection module 11, the duplicate information comparison module 13 is coupled to the follow subject comparison module 12, and the integration module 14 is coupled to the duplicate information comparison module 13. In addition, the subject comparison module 12 in this specific embodiment is further coupled to the external cross-comparison information source 22. The subject comparison module 12 includes a subject comparison model 121; the duplicate information comparison module 13 includes an information comparison model 131. Through the follow list sorting and analysis system 1, news articles can be collected and analyzed from the news media information source 21 to obtain relevant information about the people or customers you want to follow from these news articles, as a basis for monitoring high-risk customers, and even to achieve the effect of money laundering or fraud prevention. The following will be used in conjunction with the follow list sorting and analysis method to explain the functions of each module in the follow list sorting and analysis system 1 of this specific embodiment. In practice, the news collection module of the attention list sorting and analysis system 1 can be a network module in a computer system to connect to external news media information sources; the attention subject matching module, duplicate information matching module and integration module can be integrated into the central processing unit of the computer system or cloud system, or integrated into an integrated chip.

請一併參閱圖1及圖2。圖2係繪示根據本發明之一具體實施例之關注名單整理分析方法的步驟流程圖。請注意，圖2的關注名單整理分析方法的步驟可以透過圖1的關注名單整理分析系統1來達成，因此，以下藉由圖1之關注名單整理分析系統1的架構，說明圖2的具體實施例中的關注名單整理分析方法的各個步驟。如圖2所示，於本具體實施例中，關注名單整理分析方法包含步驟S1：新聞收集模組11搜集新聞媒體資訊源21的新聞文章；步驟S2：關注主體比對模組12運用自然語言技術分別針對新聞文章辨識是否具有關注主體，再分別針對具有關注主體之新聞文章辨識其內容是否符合關注議題，並且將具有關注主體且符合關注議題之新聞文章擷取關鍵字；步驟S3：重複資訊比對模組13自關注主體比對模組12接收具有關注主體且符合關注議題之每一新聞文章及對應的關鍵字，並將接收到的新聞文章分別比對目的檔案，且根據各新聞文章與目的檔案的比對結果更新目的檔案；以及，步驟S4：統整模組14根據目的檔案產出關注人員名單及其相關資訊。 Please refer to FIG. 1 and FIG. 2 together. FIG. 2 is a flowchart showing the steps of the method for sorting and analyzing the attention list according to a specific embodiment of the present invention. Please note that the steps of the method for sorting and analyzing the attention list in FIG. 2 can be achieved through the system for sorting and analyzing the attention list 1 in FIG. 1 , and therefore, the following describes the steps of the method for sorting and analyzing the attention list in the specific embodiment of FIG. 2 by using the structure of the system for sorting and analyzing the attention list 1 in FIG. 1 . As shown in FIG2 , in this specific embodiment, the method for sorting and analyzing the attention list includes step S1: the news collection module 11 collects news articles from the news media information source 21; step S2: the attention subject comparison module 12 uses natural language technology to identify whether the news articles have attention subjects, and then identifies whether the content of the news articles with attention subjects meets the attention topics, and compares the news articles with attention subjects and meeting the attention topics. The news article of the subject is extracted to extract keywords; Step S3: The duplicate information matching module 13 receives each news article and corresponding keywords that has a subject of interest and meets the subject of interest from the subject of interest matching module 12, and compares the received news articles with the target file respectively, and updates the target file according to the comparison results of each news article with the target file; and Step S4: The integration module 14 generates a list of followers and their related information according to the target file.

於實務中，關注名單整理分析系統1的新聞收集模組11可連接一個或多個新聞媒體資訊源21，或者也可以藉由網路爬蟲程式自動化的收集網路上的各新聞媒體資訊源的新聞文章、資料，其中若出現影音、影像檔等非文字的資料，則可以適時的將相關資料進行文字化以取得文字內容，並且藉由收集到的新聞文章及相關資料進一步產生新聞文章列表。另外，當關注主體比對模組12針對收集到的新聞文章及相關資料進行辨識分析時，若新聞文章內容中未出現關注主體，則進行新聞文章列表中下一篇新聞文章的辨識分析；若是有出現關注主體，則針對此篇新聞文章內容是否符合關注議題進行辨識分析，以篩選出同時具有關鍵主體並符合關鍵議題的新聞文章。此外，於實務中，新聞收集模組11連接之新聞媒體資訊源21的種類可以為新聞文字內容以及新聞網站網址，例如，網路新聞媒體平台以及線上報章雜誌平台；相關資料格式可以為文字檔、語音檔、影音檔以及相片檔；資料出處以及搜尋範圍可以為國內媒體以及全球媒體。新聞收集模組11所連接新聞媒體資訊源21的數量可為一個，如圖1所示，或者可連接多個新聞媒體資訊源21以同時取得該些新聞媒體資訊源21中的新聞文章，其數量可依使用者需求進行調整。此外，統整模組14針對具有關注主體且符合關注議題之新聞文章的目的檔案，進一步經彙整後產生使用者所關注的關注人員名單及其相關資訊。 In practice, the news collection module 11 of the attention list sorting and analysis system 1 can be connected to one or more news media information sources 21, or it can also automatically collect news articles and data from various news media information sources on the Internet through a web crawler program. If non-text data such as audio and video files appear, the relevant data can be converted into text in a timely manner to obtain text content, and a news article list can be further generated through the collected news articles and related data. In addition, when the subject comparison module 12 performs identification and analysis on the collected news articles and related data, if the subject does not appear in the content of the news article, the identification and analysis of the next news article in the news article list is performed; if the subject appears, the identification and analysis is performed on whether the content of this news article conforms to the topic of interest, so as to screen out news articles that have both the key subject and the key topic. In addition, in practice, the types of news media information sources 21 connected to the news collection module 11 can be news text content and news website URLs, such as online news media platforms and online newspaper and magazine platforms; the relevant data formats can be text files, voice files, video files, and photo files; the data source and search scope can be domestic media and global media. The number of news media information sources 21 connected to the news collection module 11 can be one, as shown in FIG1 , or multiple news media information sources 21 can be connected to obtain news articles in these news media information sources 21 at the same time, and the number can be adjusted according to user needs. In addition, the integration module 14 further aggregates the target files of news articles with a focus subject and in line with the focus topic to generate a list of followers followed by the user and their related information.

請一併參閱圖1以及圖3。圖3係繪示根據本發明之另一具體實施例之關注名單整理分析方法的步驟流程圖。同樣地，本具體實施例中，圖3之方法的各步驟可以透過圖1的系統架構來達成，故以下藉由圖1中的系統架構說明圖3的步驟。如圖3所示，本具體實施例與前述具體實施例不同處，在於本具體實施例之關注名單整理分析方法進一步包含步驟S5接續於步驟S4後執行：自動派送模組15將統整模組14所產出的目的檔案產出之關注人員名單及其相關資訊以訊息通知方式向外發送。請注意，本具體實施例之關注名單整理分析方法中的其他步驟，係與前述具體實施例中對應的步驟大致相同，故於此不再贅述。 Please refer to FIG. 1 and FIG. 3 together. FIG. 3 is a flow chart showing the steps of a method for sorting and analyzing a follow list according to another specific embodiment of the present invention. Similarly, in this specific embodiment, each step of the method of FIG. 3 can be achieved through the system architecture of FIG. 1, so the steps of FIG. 3 are described below by referring to the system architecture in FIG. 1. As shown in FIG. 3, the difference between this specific embodiment and the aforementioned specific embodiment is that the method for sorting and analyzing a follow list of this specific embodiment further includes step S5 executed after step S4: the automatic delivery module 15 sends the follow list of persons to be followed and the related information thereof generated by the destination file generated by the integration module 14 to the outside in the form of a message notification. Please note that the other steps in the method for sorting and analyzing the concerned list of this specific embodiment are roughly the same as the corresponding steps in the aforementioned specific embodiment, so they will not be elaborated here.

於實務中，自動派送模組15可以透過電子郵件、手機訊息以及社群平台將目的檔案產出之關注人員名單及其相關資訊進行訊息通知及傳送。此外，使用者可依照需求自行設置定期或及時的訊息通知及傳送，以及時或定期收到最新及經更新之關注人員名單及其相關資訊資訊，不需額外花費時間重新進行搜尋，進而達到減少人力及時間成本，並提高整體效率。 In practice, the automatic delivery module 15 can notify and send the list of people to be followed and their related information generated by the target file through email, mobile phone messages and social platforms. In addition, users can set up regular or timely message notifications and transmissions according to their needs, so as to receive the latest and updated list of people to be followed and their related information in a timely or regular manner, without having to spend extra time searching again, thereby reducing manpower and time costs and improving overall efficiency.

再者，如圖3所示，本具體實施例中，關注名單整理分析方法進一步包含步驟S11於步驟S2前執行：使用者藉由使用者介面輸入關注主體並設定關注議題。詳言之，藉由關注名單整理分析系統中的使用者介面 (未顯示)，使用者可事先輸入欲關注之關注主體以及關注議題，以利關注主題比對模組12可針對新聞文章內容進行辨識及分析出關注主體以及關注議題。其中，關注主體可為人物、組織、公司行號以及財團法人；關注議題可為洗錢、詐欺、貪汙以及其他違法之行為。舉例來說，若使用者欲調查某位人物是否具有洗錢、詐欺以及貪汙等前科，則可輸入該人物之姓名為關注主體，並設定關注議題為洗錢、詐欺以及貪汙。於實務中，使用者欲關注之關注主體以及關注議題的類型不限與此，且關注主體以及關注議題的數量可依使用者需求進行設定。 Furthermore, as shown in FIG3 , in this specific embodiment, the attention list sorting and analysis method further includes step S11 executed before step S2: the user inputs the attention subject and sets the attention topic through the user interface. In detail, through the user interface (not shown) in the attention list sorting and analysis system, the user can input the attention subject and attention topic to be followed in advance, so that the attention topic comparison module 12 can identify and analyze the attention subject and attention topic according to the content of the news article. Among them, the attention subject can be a person, organization, company, and foundation; the attention topic can be money laundering, fraud, corruption, and other illegal acts. For example, if a user wants to investigate whether a person has a criminal record of money laundering, fraud, and corruption, he or she can enter the name of the person as the subject of attention and set the topics of attention to money laundering, fraud, and corruption. In practice, the types of subjects and topics of attention that users want to pay attention to are not limited to these, and the number of subjects and topics of attention can be set according to user needs.

請一併參閱圖1以及圖4。圖4係繪示根據本發明之另一具體實施例之關注名單整理分析方法的步驟流程圖。同樣地，本具體實施例中，圖4之方法的各步驟可以透過圖1的系統架構來達成，故以下藉由圖1中的系統架構說明圖4的步驟。如圖4所示，本具體實施例與前述具體實施例不同處，在於本具體實施例之關注名單整理分析方法進一步包含步驟S2’接續於步驟S1後執行：關注主體比對模組12透過主體比對模型121將新聞文章與交叉比對資訊源22進行關注主體比對，再分別針對具有關注主體之新聞文章辨識其內容是否符合關注議題，並且將具有關注主體且符合關注議題之新聞文章擷取關鍵字。其中，主體比對模型121係透過機器學習演算法產生，並且交叉比對資訊源22進一步包含商業資料庫、加值資訊資料庫以及查詢數位足跡之網站。請注意，本具體實施例之關注名單整理分析方法中的其他步驟，係與前述具體實施例中對應的步驟大致相同，故於此不再贅述。 Please refer to FIG. 1 and FIG. 4 together. FIG. 4 is a flowchart showing a method for sorting and analyzing a list of interests according to another specific embodiment of the present invention. Similarly, in this specific embodiment, each step of the method of FIG. 4 can be achieved through the system architecture of FIG. 1, so the steps of FIG. 4 are explained below by using the system architecture in FIG. 1. As shown in FIG4 , the difference between this embodiment and the aforementioned embodiment is that the method for sorting and analyzing the attention list of this embodiment further includes step S2' executed after step S1: the attention subject comparison module 12 compares the news article with the cross-comparison information source 22 through the subject comparison model 121, and then identifies whether the content of the news article with the attention subject meets the attention topic, and extracts keywords from the news article with the attention subject and meeting the attention topic. Among them, the subject comparison model 121 is generated through a machine learning algorithm, and the cross-comparison information source 22 further includes a business database, a value-added information database, and a website for querying digital footprints. Please note that the other steps in the method for sorting and analyzing the concerned list of this specific embodiment are roughly the same as the corresponding steps in the aforementioned specific embodiment, so they will not be elaborated here.

於本具體實施例中，主體比對模型121透過機器學習演算法學習藉由大量文章中取得具有人物、地點、時間、組織、職稱、以及行為動作取向的關聯詞，以建立資訊內容描述情事與關注主體之間關聯度的智能模型。於實務中，用來與新聞文章進行關注主體比對的交叉比對資訊源22的種類不限與此，並且所選用的交叉比對資訊源22的數量可依照使用者需求進行調整。 In this specific embodiment, the subject comparison model 121 learns through a machine learning algorithm to obtain related words with people, places, time, organizations, titles, and behavior orientations from a large number of articles to establish an intelligent model of the correlation between the information content describing the situation and the subject of interest. In practice, the types of cross-comparison information sources 22 used to compare the subject of interest with the news article are not limited to this, and the number of cross-comparison information sources 22 selected can be adjusted according to user needs.

於另一具體實施例中，交叉比對資訊源22可以為經授權之全球商業資料庫、工商登記查詢平台以及其他公開管道平台。針對資訊不明確者藉由交叉比對資訊源22以提供補充資料來源，例如，新聞文章中的關注主體人物之判決結果以及關注主體公司商行的法人狀態是否經解散、重整。但於實務中。交叉比對資訊源22的種類不限於此，並且交叉比對資訊源的種類以及數量可依使用者需求進行調整。 In another specific embodiment, the cross-reference information source 22 can be an authorized global business database, business registration query platform, and other public channel platforms. For those with unclear information, the cross-reference information source 22 is used to provide supplementary data sources, such as the judgment results of the subject of the news article and whether the legal status of the subject company or business has been dissolved or reorganized. However, in practice, the types of cross-reference information sources 22 are not limited to this, and the types and quantities of cross-reference information sources can be adjusted according to user needs.

此外，本具體實施例與上述具體實施例另一不同之處在於，本具體實施例之關注名單整理分析方法進一步包含步驟S31：重複資訊比對模組13自關注主體比對模組12接收具有關注主體且符合關注議題之每一新聞文章及對應的關鍵字，並將接收到的新聞文章分別比對目的檔案，當新聞文章中之第一新聞文章與目的檔案的比對結果為關注主體重複且沒有資訊差異，重複資訊比對模組13刪除第一新聞文章；步驟S32：重複資訊比對模組13自關注主體比對模組12接收具有關注主體且符合關注議題之每一新聞文章及對應的關鍵字，並將接收到的新聞文章分別比對目的檔案，當新聞文章中之第一新聞文章與目的檔案的比對結果為關注主體重複且有資訊變化，重複資訊比對模組13將資訊變化加入目的檔案；以及步驟S33：重複資訊比對模組13自關注主體比對模組12接收具有關注主體且符合關注議題之每一新聞文章及對應的關鍵字，並將接收到的新聞文章分別比對目的檔案，當新聞文章中之第一新聞文章與目的檔案的比對結果為關注主體不重複，重複資訊比對模組13將第一新聞文章加入目的檔案。藉由步驟S31~S33，重複資訊比對模組13可將比對後的結果更新目的檔案。 In addition, another difference between the present embodiment and the above-mentioned embodiment is that the attention list sorting and analysis method of the present embodiment further includes step S31: the duplicate information comparison module 13 receives each news article and corresponding keywords having an attention subject and meeting the attention topic from the attention subject comparison module 12, and compares the received news articles with the target file respectively. When the comparison result between the first news article in the news article and the target file is that the attention subject is repeated and there is no information difference, the duplicate information comparison module 13 deletes the first news article; step S32: the duplicate information comparison module 13 receives each news article having an attention subject and meeting the attention topic from the attention subject comparison module 12. A news article and corresponding keywords, and the received news articles are compared with the target file respectively. When the comparison result of the first news article in the news article with the target file is that the subject of attention is repeated and there is information change, the duplicate information comparison module 13 adds the information change to the target file; and step S33: the duplicate information comparison module 13 receives each news article with a subject of attention and a corresponding keyword that meets the topic of attention from the subject of attention comparison module 12, and compares the received news articles with the target file respectively. When the comparison result of the first news article in the news article with the target file is that the subject of attention is not repeated, the duplicate information comparison module 13 adds the first news article to the target file. Through steps S31~S33, the duplicate information comparison module 13 can update the target file with the comparison result.

於本具體實施例中，重複資訊比對模組13可以進一步包含資訊比對模型131進行前述的比對流程。其中，資訊比對模型131係透過監督式機器學習演算法產生。於實務中，監督式機器學習時會使用訓練資料集進行機器學習，並產生監督式機器學習演算法。其中，訓練資料集包含標籤和特徵。標籤是預測目標，而特徵是模型用於預測標籤的輸入資料。於本實施例中，監督式機器學習的訓練資料集係利用人物、組織、名稱、地點、職稱、時間等特徵，並將其分類為重複或有效/無效等標籤，而訓練資料集內容可能包含新聞文章、新聞文章標籤、新聞文章分類以及新聞文章分析等。但於實際應用中，不限於此。 In this specific embodiment, the repeated information comparison module 13 may further include an information comparison model 131 to perform the aforementioned comparison process. The information comparison model 131 is generated by a supervised machine learning algorithm. In practice, supervised machine learning uses a training data set for machine learning and generates a supervised machine learning algorithm. The training data set includes labels and features. The label is the prediction target, and the feature is the input data used by the model to predict the label. In this embodiment, the training data set of supervised machine learning uses features such as people, organizations, names, locations, job titles, and time, and classifies them into labels such as duplicate or valid/invalid, and the content of the training data set may include news articles, news article labels, news article classification, and news article analysis. However, in actual applications, it is not limited to this.

如前所述，關注主體比對模組12中的主體比對模型121以及重複資訊比對模組13中的資訊比對模型131皆可透過監督式機器學習演算法產生。主體比對模型121用以將新聞文章與交叉比對資訊源22進行關注主體比對，且資訊比對模型131用以將接收到的新聞文章分別比對目的檔案。其中，主體比對模型121以及資訊比對模型131中的監督式學習演算法為結合深度神經網路(Deep neural network,DNN)與長短期記憶(Long short-term memory,LSTM)兩種不同的演算法而形成的DNN-LSTM模型。換言之，本具體施實例中的主體比對模型121”以及資訊比對模型131”藉由結合LSTM的長期記憶能力與DNN的深度學習能力以進行新聞文章的辨識比對分析。但於實際應用中，關注名單整理分析系統中使用的監督式學習演算法不限於此。藉此，本發明的關注名單整理分析系統藉由導入人工智慧，透過監督式學習演算法簡化反覆比對的過程以及減少名單錯判率以及重複率 As mentioned above, the subject comparison model 121 in the subject comparison module 12 and the information comparison model 131 in the duplicate information comparison module 13 can be generated by a supervised machine learning algorithm. The subject comparison model 121 is used to compare the news article with the cross-comparison information source 22, and the information comparison model 131 is used to compare the received news article with the target file. Among them, the supervised learning algorithm in the subject comparison model 121 and the information comparison model 131 is a DNN-LSTM model formed by combining two different algorithms, deep neural network (DNN) and long short-term memory (LSTM). In other words, the subject comparison model 121" and the information comparison model 131" in this specific embodiment combine the long-term memory ability of LSTM and the deep learning ability of DNN to perform identification and comparison analysis of news articles. However, in actual applications, the supervised learning algorithm used in the attention list sorting and analysis system is not limited to this. Thus, the attention list sorting and analysis system of the present invention introduces artificial intelligence and simplifies the repeated comparison process and reduces the list error rate and repetition rate through the supervised learning algorithm.

綜上所述，本發明提供一種關注名單整理分析方法及系統，藉由關注主體比對模組針對關注主體以及關注議題篩選出符合使用者所關注的新聞文章，再藉由重複資訊比對模組將篩選出的新聞文章，與目的檔案做比對，並根據比對結果將目的檔案做更新。本發明透過關注名單整理分析方法及系統可以快速從網路搜尋結果取得有用且所需之新聞媒體資料的資訊。此外，藉由導入人工智慧方法，透過監督式機器學習以降低關注人員名單的錯判率、重複率及排除無效資訊，使降低人為出錯率以及減少人工判讀所耗費的時間，進而提高效率並降低時間及人力成本。此外，使用者透過自動派送模組，以及時或定期收到最新之關注人員名單及其相關資訊資訊，使用者不需額外花費時間重新進行搜尋，進而達到減少人力及時間成本，並提高整體搜尋檢索的效率。 In summary, the present invention provides a method and system for sorting and analyzing a follow list, wherein a follow subject comparison module is used to screen out news articles that meet the user's concerns based on the follow subject and the follow topic, and then a duplicate information comparison module is used to compare the screened news articles with the target file, and the target file is updated according to the comparison result. The present invention can quickly obtain useful and required news media data information from the network search results through the follow list sorting and analysis method and system. In addition, by introducing artificial intelligence methods, supervised machine learning is used to reduce the error rate and duplication rate of the follow list and exclude invalid information, thereby reducing the human error rate and the time spent on manual interpretation, thereby improving efficiency and reducing time and labor costs. In addition, users can receive the latest list of people to follow and their related information in a timely or regular manner through the automatic delivery module, so users do not need to spend extra time searching again, thereby reducing manpower and time costs and improving the overall search and retrieval efficiency.

藉由以上較佳具體實施例之詳述，係希望能更加清楚描述本發明之特徵與精神，而並非以上述所揭露的較佳具體實施例來對本發明之範疇加以限制。相反地，其目的是希望能涵蓋各種改變及具相等性的安排於本發明所欲申請之專利範圍的範疇內。因此，本發明所申請之專利範圍的範疇應該根據上述的說明作最寬廣的解釋，以致使其涵蓋所有可能的改變以及具相等性的安排。 The above detailed description of the preferred specific embodiments is intended to more clearly describe the features and spirit of the present invention, rather than to limit the scope of the present invention by the preferred specific embodiments disclosed above. On the contrary, the purpose is to cover various changes and arrangements with equivalents within the scope of the patent application for the present invention. Therefore, the scope of the patent application for the present invention should be interpreted in the broadest sense based on the above description, so as to cover all possible changes and arrangements with equivalents.

S1~S4:步驟 S1~S4: Steps

Claims

A method for sorting and analyzing a follow list is used to analyze and compare a plurality of news articles from at least one news media information source to generate a follow list and related information thereof. The method for sorting and analyzing a follow list comprises the following steps: a news collection module collects the news articles from the at least one news media information source; a follow subject comparison module uses a natural language technology to identify whether the news articles have a follow subject, and then identifies whether the content of the news articles with the follow subject meets a follow topic, and extracts at least one keyword from the news articles with the follow subject and meeting the follow topic; a duplicate information comparison module extracts at least one keyword from the follow subject; The subject matching module receives each of the news articles having the concerned subject and matching the concerned topic and the corresponding at least one keyword, and matches the received news articles with a target file respectively; when the comparison result of a first news article among the news articles with the target file is that the concerned subject is repeated and there is no information difference, the duplicate information matching module deletes the first news article; when the comparison result of the first news article with the target file is that the concerned subject is not repeated, the duplicate information matching module adds the first news article to the target file to update the target file; and the unification module generates the concerned person list and related information according to the target file.

The method for sorting and analyzing the concerned list as described in Item 1 of the patent application scope further includes the following steps: an automatic delivery module sends the concerned person list and related information generated by the target file to the outside in the form of a message notification.

As described in Item 1 of the patent application scope, a method for sorting and analyzing a concerned list, wherein a user inputs the concerned subject and sets the concerned topic through a user interface.

As described in item 1 of the patent application scope, the concerned subject comparison module compares the concerned subject of the news articles with a cross-comparison information source through a subject comparison model, and the subject comparison model is generated through a machine learning algorithm; wherein the cross-comparison information source further includes a business database, a value-added information database, and a website for querying digital footprints.

The method for sorting and analyzing the attention list as described in Item 1 of the patent application scope further includes the following steps: when the comparison result between the first news article and the target file is that the attention subject is repeated and there is an information change, the duplicate information comparison module adds the information change to the target file to update the target file.

The method for sorting and analyzing the attention list as described in item 1 of the patent application scope further comprises the following steps: the duplicate information comparison module compares the news articles received by an information comparison model with the target file respectively, wherein the information comparison model is generated by a supervised machine learning algorithm.

A follow list sorting and analysis system is used to analyze a plurality of news articles from at least one news media information source to generate a follow list and related information thereof. The follow list sorting and analysis system comprises: a news collection module, used to connect to the at least one news media information source to collect the news articles from the at least one news media information source; a follow subject comparison module, coupled to the news collection module to receive the news articles, The subject-of-interest matching module is used to identify whether the news articles have a subject of interest through a natural language technology, and then identify whether the content of the news articles having the subject of interest conforms to a topic of interest, and extract at least one keyword from the news articles having the subject of interest and conforming to the topic of interest; a duplicate information matching module is coupled to the subject-of-interest matching module, and the duplicate information matching module The module is used to receive each of the news articles having the subject of interest and matching the topic of interest and the corresponding at least one keyword from the subject of interest comparison module, and to compare the received news articles with a target file respectively, and to update the target file according to the comparison result between each of the news articles and the target file, wherein when the comparison result between a first news article among the news articles and the target file is that the subject of interest is repeated and there is no When there is information difference, the duplicate information matching module deletes the first news article, and when the comparison result between the first news article and the target file is that the subject of attention is not repeated, the duplicate information matching module adds the first news article to the target file to update the target file; and a unification module, coupled to the duplicate information matching module, the unification module is used to generate the list of attention persons and related information according to the target file.

The concerned list sorting and analysis system as described in Item 7 of the patent application scope further includes an automatic delivery module coupled to the integration module, which is used to send the concerned person list and related information generated by the target file to the outside in the form of a message notification.

As described in item 7 of the patent application scope, the concerned list sorting and analysis system, wherein the concerned subject comparison module is coupled to a cross-comparison information source, and the concerned subject comparison module compares the concerned subject of the news articles with the cross-comparison information source, and the cross-comparison information source further includes a business database, a value-added information database, and a website for querying digital footprints.

As described in item 7 of the patent application scope, the concerned list sorting and analysis system, wherein the concerned subject comparison module includes a subject comparison model, and the duplicate information comparison module includes an information comparison model, wherein the subject comparison model and the information comparison model are generated by a supervised machine learning algorithm, the subject comparison model is used to compare the concerned subject with the news articles and a cross-comparison information source, and the information comparison model is used to compare the received news articles with the target file respectively.