CN114218201A - Method for removing duplication of bank fund flow data - Google Patents
Method for removing duplication of bank fund flow data Download PDFInfo
- Publication number
- CN114218201A CN114218201A CN202111429072.4A CN202111429072A CN114218201A CN 114218201 A CN114218201 A CN 114218201A CN 202111429072 A CN202111429072 A CN 202111429072A CN 114218201 A CN114218201 A CN 114218201A
- Authority
- CN
- China
- Prior art keywords
- flow data
- fund flow
- fund
- balance
- transaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/10—Tax strategies
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention relates to a duplication eliminating method for bank fund flow data. Which comprises the following steps: step 1, providing fund running water data; step 2, storing the first capital running water data in an effective capital running water data set; step 3, if the fund flow data is completely consistent with any effective fund flow data, storing the fund flow data in a suspected fund flow data set, otherwise, skipping to the step 4; step 4, storing the current fund flow data in an effective fund flow data set; step 5, repeating the step 3 and the step 4 to obtain a first-period end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7; step 6, deleting the fund running water data in the suspected fund running water data set; step 7, judging that the fund running water data in the suspected fund running water data set is non-repeated fund running water data; and 8, ending the duplicate removal. The invention can effectively remove the repeated data in the bank fund flow data.
Description
Technical Field
The invention relates to bank fund data cleaning, in particular to a method for removing the duplicate of bank fund flow data.
Background
In various working occasions such as the current tax inspection and the like, bank fund flow data generally needs to be called, and during actual calling, the called bank fund flow data has the conditions of various formats and various channels. For the convenience of subsequent processing, the called bank fund flow data is generally required to be converted into a receipt and payment format. When the bank fund flow data is converted into the receipt and payment format and the transaction time data is lack of the bank fund flow data, the characteristics of the receipt and payment format are known, so that the converted receipt and payment format data can be repeated.
When the repeated receipt and payment format data is used for case study and judgment and other situations, the accuracy and reliability of actual case study and judgment are seriously influenced, and the actual data use requirement is difficult to meet.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a method for removing the duplicate of the bank fund flow data, which can effectively remove the duplicate data in the bank fund flow data in the absence of transaction time, meet the use requirement of the actual bank fund flow data and is safe and reliable.
According to the technical scheme provided by the invention, the duplication elimination method of the bank fund flow data comprises the following steps:
step 1, providing fund flow data in a receipt and payment format, and skipping to step 2 when checking and confirming that the fund flow data in the receipt and payment format is repeated, or skipping to step 8;
step 2, sorting the fund flow data in the receipt and payment format according to the transaction date, and storing the first-order fund flow data after sorting in the established effective fund flow data set as the current effective fund flow data;
step 3, comparing the fund flow data which is next to the current effective fund flow data after being sorted with all effective fund flow data in an effective fund flow data set, if the fund flow data is completely consistent with any effective fund flow data, storing the current fund flow data in the established suspected fund flow data set, otherwise, skipping to the step 4;
step 4, calculating the balance after the transaction after the current fund flow data, wherein the balance after the transaction is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data; after the balance after the transaction is obtained through calculation, the current fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data;
step 5, repeating the step 3 and the step 4 until the fund flow data at the end of the sorting is distributed and stored into a suspected fund flow data set or an effective fund flow data set, and obtaining a first-stage end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7;
step 6, deleting the fund flow data in the suspected fund flow data set as repeated fund flow data, and skipping to step 8;
step 7, judging that the fund flow data in the suspected fund flow data set is non-repeated fund flow data, and skipping to step 8;
and 8, ending the duplicate removal.
In step 1, the process of checking and confirming the repeated existence of the fund flow data in the receipt and payment format comprises the following steps:
step 1.1, determining the ratio Nx of repeated items in the fund flow data every day according to the transaction date of the fund flow data in the receipt and payment format, and determining the number Dx of days for which repeated data exist according to the ratio Nx of the repeated items every day;
and 1.2, when the number Dx of repeated data days is more than 1 or the ratio Nx of repeated items per day is more than the threshold value of the ratio of repeated items, determining that the fund flow data in the receipt and payment format is repeated.
The daily repeat term ratio Nx is the number of repeat transaction amounts on the same transaction date/total transaction running water on the current transaction date.
The fund flow data in the receipt and payment format comprises transaction date, transaction opponents, transaction amount and receipt and payment marks.
In step 5, when the balance at the end of the term is not equal to the balance in the account card, accumulating the transaction amount in all the running water fund data in the suspected fund running water data set with the balance at the end of the term, if the accumulated sum is equal to the balance in the card, executing step 7, otherwise, skipping to step 6.
The invention has the advantages that: for the bank fund flow data lacking in transaction time, repeated data in the bank fund flow data can be effectively removed, the use requirement of the actual bank fund flow data is met, and the method is safe and reliable.
Drawings
FIG. 1 is a flow chart of the present invention for checking to see if there are duplicates in the receipt and payment format fund flow.
FIG. 2 is a de-duplication flow chart of the present invention.
Detailed Description
The invention is further illustrated by the following specific figures and examples.
As shown in fig. 2: for the bank fund flow data lacking in transaction time, in order to effectively remove repeated data in the bank fund flow data and meet the use requirement of the actual bank fund flow data, the duplication removing method comprises the following steps:
step 1, providing fund flow data in a receipt and payment format, and skipping to step 2 when checking and confirming that the fund flow data in the receipt and payment format is repeated, or skipping to step 8;
specifically, the fund flow data of the receipt and payment format may be obtained by conversion by an existing common technical means, and for any fund flow data, the fund flow data of the receipt and payment format includes a transaction date, a transaction opponent, a transaction amount and a receipt and payment mark, that is, the fund flow data of the receipt and payment format lacks transaction time information.
As shown in fig. 1, a flow chart for checking whether there is duplication in the fund flow data in the receipt and payment format is provided, specifically, the flow chart for checking and confirming includes:
step 1.1, determining the ratio Nx of repeated items in the fund flow data every day according to the transaction date of the fund flow data in the receipt and payment format, and determining the number Dx of days for which repeated data exist according to the ratio Nx of the repeated items every day;
specifically, the daily repeated item ratio Nx is the number of repeated transaction amounts on the same transaction date/the total transaction running water number on the current transaction date, wherein the transaction date information is included in the fund running water data, so that the number of repeated transaction amounts on the same transaction date can be conveniently counted according to the transaction date, and the total transaction running water number on the same transaction date can be directly counted, so that the daily repeated item ratio Nx can be obtained.
And 1.2, when the number Dx of repeated data days is more than 1 or the ratio Nx of repeated items per day is more than the threshold value of the ratio of repeated items, determining that the fund flow data in the receipt and payment format is repeated.
Specifically, the duplicate rate threshold may be selected according to actual needs for confirmation, and generally, the duplicate rate threshold may be set to 20%, that is, when the duplicate rate Nx of each day is greater than 0.2, or when the number Dx of duplicate data days is greater than, it may be directly confirmed that the fund flow data in the receipt and payment format is duplicated, and a subsequent deduplication step is required, otherwise, it may be confirmed that all the fund flow data does not need deduplication, and may be directly used.
Step 2, sorting the fund flow data in the receipt and payment format according to the transaction date, and storing the first-order fund flow data after sorting in the established effective fund flow data set as the current effective fund flow data;
specifically, as can be seen from the above description, due to the existence of the transaction date, the transaction dates can be directly sorted according to the transaction dates, generally, the transaction dates are dates including year-month-day, when sorting, the dates are sorted in an ascending order, and the first-order fund flow data is data for performing the first transaction by a certain bank card, which is well known to those skilled in the art and will not be described herein again.
The established effective fund running water data set can be in the form of a common database and the like, and can be specifically selected according to actual needs, which is not described herein again. When the first capital runoff data is stored in the effective capital runoff data set, the first capital runoff data can be used as the current effective capital runoff data, and the current effective capital runoff data is used as the basis for the subsequent deduplication comparison.
Step 3, comparing the fund flow data which is next to the current effective fund flow data after being sorted with all effective fund flow data in an effective fund flow data set, if the fund flow data is completely consistent with any effective fund flow data, storing the current fund flow data in the established suspected fund flow data set, otherwise, skipping to the step 4;
specifically, when the first-order fund flow data is the current effective fund flow data, the fund flow data next to the current effective fund flow data after sorting is the second-order fund flow data, and the rest is analogized in sequence. The fund flow data in the effective fund flow data set are all effective fund flow data, and the latest deposited fund flow data in the effective fund flow data set is used as the current effective fund flow data.
In the embodiment of the invention, in comparison, the fund flow which is arranged next to the current effective fund flow is compared with all the effective fund flow data, the fund flow data is completely consistent with any one of the effective fund flow data, specifically, the transaction date, the transaction amount, the transaction opponent and the receipt and payment mark of the two are completely the same, and when one of the two is different, the two are considered to be different.
Specifically, when the fund flow data is completely consistent with an effective fund flow data, the data is considered to be suspected repeated data, and at this time, the current fund flow data is stored in a suspected fund flow data set. The specific situation of the suspected fund flow data set may refer to the description of the valid fund flow data set, and is not described herein again.
And (4) when the fund flow data is not completely the same as all the effective fund flow data, skipping to the step 4.
Step 4, calculating the balance after the transaction after the current fund flow data, wherein the balance after the transaction is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data; after the balance after the transaction is obtained through calculation, the current fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data;
specifically, a post-transaction balance is calculated, wherein the post-transaction balance is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data. If the first-place fund flow data is valid fund flow data and the second-place fund flow data is not identical to the first-place fund flow data, the post-transaction balance after the second-place fund flow data needs to be calculated, that is, the sum of the post-transaction balance after the first-place fund flow data and the transaction amount corresponding to the second-place fund flow data, of course, the transaction amount is related to the receipt and payment flag, and when the receipt and payment flag is received, the transaction amount is directly accumulated by numerical values, and when the receipt and payment flag is received, the transaction amount corresponding to the second-place fund flow data is a corresponding negative value, which is known to those skilled in the art specifically, and is not described herein again.
In the embodiment of the invention, after the squeezed balance is obtained through calculation, the current fund flow data is stored in the effective fund flow data and is used as the current effective fund flow data, if the second-bit fund flow data is stored in the effective fund flow data set, the current effective fund flow data is the second-bit fund flow data, and the first-bit fund flow data is only stored in the effective fund flow data set as the effective fund flow data.
In specific implementation, the post-transaction balance is used as a subsequent fund balance, for example, when the second-place fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data, the third-place fund flow data or other subsequent fund flow data is executed in step 4, the fund balance after the current effective fund flow data is the post-transaction balance, and the rest of the conditions are analogized in sequence, which is not illustrated herein one by one.
Step 5, repeating the step 3 and the step 4 until the fund flow data at the end of the sorting is distributed and stored into a suspected fund flow data set or an effective fund flow data set, and obtaining a first-stage end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7;
specifically, there are a plurality of sequenced fund flow data, and the above steps 3 and 4 need to be repeated until the fund flow data at the end of the sequencing is allocated and stored in the suspected fund flow data set or the effective fund flow data set. In the embodiment of the invention, the obtained end balance is the post-transaction balance calculated in the step 4. The final balance is a post-transaction balance calculated by using the transaction amount corresponding to the final fund flow data when the final fund flow data is allocated to the effective fund flow data set, or else, the final balance is a post-transaction balance calculated by using the transaction amount corresponding to the final fund flow data allocated to the effective fund flow data set.
Step 6, deleting the fund flow data in the suspected fund flow data set as repeated fund flow data, and skipping to step 8;
specifically, when the balance at the end of the period is consistent with the balance in the account card, the fund flow data in the suspected fund flow data set is considered to be repeated fund flow data and needs to be deleted, and the step 8 is skipped after deletion. The balance in the account card is the basic information of the account, which can be directly obtained, and is well known to those skilled in the art, and is not described herein again.
Step 7, judging that the fund flow data in the suspected fund flow data set is non-repeated fund flow data, and skipping to step 8;
specifically, when the end-of-term balance is inconsistent with the balance in the account card, the fund flow data in the suspected fund flow data set cannot be directly considered as the repeated fund flow rate data.
In the embodiment of the invention, when the balance at the end of the term is not equal to the balance in the account card, the transaction amount in all the running water fund data in the suspected fund running water data set is accumulated with the balance at the end of the term, if the accumulated sum is equal to the balance in the card, the step 7 is executed, otherwise, the step 6 is skipped.
And 8, ending the duplicate removal.
Specifically, when the duplicate removal is finished, the duplicate removal of all the bank fund flow data lacking transaction time can be completed.
Claims (5)
1. A duplication eliminating method for bank fund flow data is characterized by comprising the following steps:
step 1, providing fund flow data in a receipt and payment format, and skipping to step 2 when checking and confirming that the fund flow data in the receipt and payment format is repeated, or skipping to step 8;
step 2, sorting the fund flow data in the receipt and payment format according to the transaction date, and storing the first-order fund flow data after sorting in the established effective fund flow data set as the current effective fund flow data;
step 3, comparing the fund flow data which is next to the current effective fund flow data after being sorted with all effective fund flow data in an effective fund flow data set, if the fund flow data is completely consistent with any effective fund flow data, storing the current fund flow data in the established suspected fund flow data set, otherwise, skipping to the step 4;
step 4, calculating the balance after the transaction after the current fund flow data, wherein the balance after the transaction is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data; after the balance after the transaction is obtained through calculation, the current fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data;
step 5, repeating the step 3 and the step 4 until the fund flow data at the end of the sorting is distributed and stored into a suspected fund flow data set or an effective fund flow data set, and obtaining a first-stage end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7;
step 6, deleting the fund flow data in the suspected fund flow data set as repeated fund flow data, and skipping to step 8;
step 7, judging that the fund flow data in the suspected fund flow data set is non-repeated fund flow data, and skipping to step 8;
and 8, ending the duplicate removal.
2. The method for removing duplicate bank fund flow data according to claim 1, wherein the step 1 of checking and confirming the existence of duplicate receipt and payment format fund flow data comprises the following steps:
step 1.1, determining the ratio Nx of repeated items in the fund flow data every day according to the transaction date of the fund flow data in the receipt and payment format, and determining the number Dx of days for which repeated data exist according to the ratio Nx of the repeated items every day;
and 1.2, when the number Dx of repeated data days is more than 1 or the ratio Nx of repeated items per day is more than the threshold value of the ratio of repeated items, determining that the fund flow data in the receipt and payment format is repeated.
3. The method for removing duplication of bank fund flow data according to claim 2, wherein the daily duplication ratio Nx is the number of duplicate transaction amounts on the same transaction date/the total transaction flow on the current transaction date.
4. The method for removing duplication of bank fund flow data according to any one of claims 1 to 3, wherein the fund flow data in the receipt and payment format comprises transaction date, transaction opponent, transaction amount and receipt and payment mark.
5. The method for removing the duplication of the bank fund flow data according to any one of claims 1 to 3, wherein in step 5, when the end-of-term balance is not equal to the balance in the account card, the transaction amount in all the flow fund data in the suspected fund flow data set is accumulated with the end-of-term balance, if the accumulated sum is equal to the balance in the card, step 7 is executed, otherwise, step 6 is skipped.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111429072.4A CN114218201B (en) | 2021-11-29 | 2021-11-29 | Method for removing duplication of bank funds flow data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111429072.4A CN114218201B (en) | 2021-11-29 | 2021-11-29 | Method for removing duplication of bank funds flow data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114218201A true CN114218201A (en) | 2022-03-22 |
| CN114218201B CN114218201B (en) | 2024-10-15 |
Family
ID=80698720
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111429072.4A Active CN114218201B (en) | 2021-11-29 | 2021-11-29 | Method for removing duplication of bank funds flow data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114218201B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100306088A1 (en) * | 2008-08-01 | 2010-12-02 | Hantz Group, Inc. | Single or multi-company business accounting system and method for same including account number maintenance |
| CN112581268A (en) * | 2019-09-30 | 2021-03-30 | 北京宸瑞科技股份有限公司 | Mass fund transaction data intelligence analysis method and system |
| US11094020B1 (en) * | 2018-02-01 | 2021-08-17 | Dailypay, Inc. | Methods and apparatus for constructing machine learning models to process user data and provide advance access to payments |
-
2021
- 2021-11-29 CN CN202111429072.4A patent/CN114218201B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100306088A1 (en) * | 2008-08-01 | 2010-12-02 | Hantz Group, Inc. | Single or multi-company business accounting system and method for same including account number maintenance |
| US11094020B1 (en) * | 2018-02-01 | 2021-08-17 | Dailypay, Inc. | Methods and apparatus for constructing machine learning models to process user data and provide advance access to payments |
| CN112581268A (en) * | 2019-09-30 | 2021-03-30 | 北京宸瑞科技股份有限公司 | Mass fund transaction data intelligence analysis method and system |
Non-Patent Citations (2)
| Title |
|---|
| 吴志涛: "电信大客户网管告警数据分析平台的开发与应用", 中国优秀硕士学位论文全文数据库 (信息科技辑), 15 May 2021 (2021-05-15), pages 138 - 626 * |
| 王慧;: "零余额账户管理的问题及建议", 金融科技时代, no. 12, 10 December 2014 (2014-12-10), pages 78 - 79 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114218201B (en) | 2024-10-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109598095B (en) | Method and device for establishing scoring card model, computer equipment and storage medium | |
| CN113626527B (en) | A financial data processing method and system | |
| CN108596396B (en) | Road surface performance prediction and maintenance method and device based on maintenance history correction | |
| CN112734219B (en) | Vehicle transportation running behavior analysis method and system | |
| EP3038025A1 (en) | Retention risk determiner | |
| EP2357857B1 (en) | Method and apparatus for generating phone bill | |
| CN111061709A (en) | An automatic method and system for data cleaning of similar repeated records | |
| CN114647684A (en) | Traffic prediction method and device based on stacking algorithm and related equipment | |
| CN114218201A (en) | Method for removing duplication of bank fund flow data | |
| CN112213579A (en) | Method and device for identifying faults of turnout switch machine | |
| CN112527922B (en) | Data warehouse incremental processing method based on invariant model | |
| CN115422464A (en) | Method and device for determining number of persons participating in sequence event and storage medium | |
| CN117235062B (en) | Service system data modeling method based on data center | |
| CN107783896B (en) | Optimization method and device of data processing model | |
| CN116775632A (en) | Near-real-time cleaning data execution method based on vehicle-mounted terminal acquisition data | |
| CN112632953B (en) | Method for rapidly and accurately detecting that multiple uploaded bill of materials belongs to same product | |
| CN111461617B (en) | Inventory counting method and device, computer equipment and storage medium | |
| CN109685453B (en) | Method for intelligently identifying effective paths of workflow | |
| CN116501740A (en) | Method, device, equipment and storage medium for creating database table partitions | |
| TWI628607B (en) | Accounting processing notice billing re-validation method and system | |
| CN114399362A (en) | Expert extraction encryption method for integrating government procurement expert credit rating | |
| CN110222078B (en) | Data processing method and device | |
| CN113138980A (en) | Data processing method, device, terminal and storage medium | |
| JP2019109840A (en) | Parts trace management system and parts trace management method | |
| CN114564521A (en) | Method and system for determining working time period of agricultural machine based on clustering algorithm |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |