[go: up one dir, main page]

CN114218201A - Method for removing duplication of bank fund flow data - Google Patents

Method for removing duplication of bank fund flow data Download PDF

Info

Publication number
CN114218201A
CN114218201A CN202111429072.4A CN202111429072A CN114218201A CN 114218201 A CN114218201 A CN 114218201A CN 202111429072 A CN202111429072 A CN 202111429072A CN 114218201 A CN114218201 A CN 114218201A
Authority
CN
China
Prior art keywords
flow data
fund flow
fund
balance
transaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111429072.4A
Other languages
Chinese (zh)
Other versions
CN114218201B (en
Inventor
严俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Tax Software Technology Co ltd
Original Assignee
Jiangsu Tax Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Tax Software Technology Co ltd filed Critical Jiangsu Tax Software Technology Co ltd
Priority to CN202111429072.4A priority Critical patent/CN114218201B/en
Publication of CN114218201A publication Critical patent/CN114218201A/en
Application granted granted Critical
Publication of CN114218201B publication Critical patent/CN114218201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention relates to a duplication eliminating method for bank fund flow data. Which comprises the following steps: step 1, providing fund running water data; step 2, storing the first capital running water data in an effective capital running water data set; step 3, if the fund flow data is completely consistent with any effective fund flow data, storing the fund flow data in a suspected fund flow data set, otherwise, skipping to the step 4; step 4, storing the current fund flow data in an effective fund flow data set; step 5, repeating the step 3 and the step 4 to obtain a first-period end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7; step 6, deleting the fund running water data in the suspected fund running water data set; step 7, judging that the fund running water data in the suspected fund running water data set is non-repeated fund running water data; and 8, ending the duplicate removal. The invention can effectively remove the repeated data in the bank fund flow data.

Description

Method for removing duplication of bank fund flow data
Technical Field
The invention relates to bank fund data cleaning, in particular to a method for removing the duplicate of bank fund flow data.
Background
In various working occasions such as the current tax inspection and the like, bank fund flow data generally needs to be called, and during actual calling, the called bank fund flow data has the conditions of various formats and various channels. For the convenience of subsequent processing, the called bank fund flow data is generally required to be converted into a receipt and payment format. When the bank fund flow data is converted into the receipt and payment format and the transaction time data is lack of the bank fund flow data, the characteristics of the receipt and payment format are known, so that the converted receipt and payment format data can be repeated.
When the repeated receipt and payment format data is used for case study and judgment and other situations, the accuracy and reliability of actual case study and judgment are seriously influenced, and the actual data use requirement is difficult to meet.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a method for removing the duplicate of the bank fund flow data, which can effectively remove the duplicate data in the bank fund flow data in the absence of transaction time, meet the use requirement of the actual bank fund flow data and is safe and reliable.
According to the technical scheme provided by the invention, the duplication elimination method of the bank fund flow data comprises the following steps:
step 1, providing fund flow data in a receipt and payment format, and skipping to step 2 when checking and confirming that the fund flow data in the receipt and payment format is repeated, or skipping to step 8;
step 2, sorting the fund flow data in the receipt and payment format according to the transaction date, and storing the first-order fund flow data after sorting in the established effective fund flow data set as the current effective fund flow data;
step 3, comparing the fund flow data which is next to the current effective fund flow data after being sorted with all effective fund flow data in an effective fund flow data set, if the fund flow data is completely consistent with any effective fund flow data, storing the current fund flow data in the established suspected fund flow data set, otherwise, skipping to the step 4;
step 4, calculating the balance after the transaction after the current fund flow data, wherein the balance after the transaction is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data; after the balance after the transaction is obtained through calculation, the current fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data;
step 5, repeating the step 3 and the step 4 until the fund flow data at the end of the sorting is distributed and stored into a suspected fund flow data set or an effective fund flow data set, and obtaining a first-stage end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7;
step 6, deleting the fund flow data in the suspected fund flow data set as repeated fund flow data, and skipping to step 8;
step 7, judging that the fund flow data in the suspected fund flow data set is non-repeated fund flow data, and skipping to step 8;
and 8, ending the duplicate removal.
In step 1, the process of checking and confirming the repeated existence of the fund flow data in the receipt and payment format comprises the following steps:
step 1.1, determining the ratio Nx of repeated items in the fund flow data every day according to the transaction date of the fund flow data in the receipt and payment format, and determining the number Dx of days for which repeated data exist according to the ratio Nx of the repeated items every day;
and 1.2, when the number Dx of repeated data days is more than 1 or the ratio Nx of repeated items per day is more than the threshold value of the ratio of repeated items, determining that the fund flow data in the receipt and payment format is repeated.
The daily repeat term ratio Nx is the number of repeat transaction amounts on the same transaction date/total transaction running water on the current transaction date.
The fund flow data in the receipt and payment format comprises transaction date, transaction opponents, transaction amount and receipt and payment marks.
In step 5, when the balance at the end of the term is not equal to the balance in the account card, accumulating the transaction amount in all the running water fund data in the suspected fund running water data set with the balance at the end of the term, if the accumulated sum is equal to the balance in the card, executing step 7, otherwise, skipping to step 6.
The invention has the advantages that: for the bank fund flow data lacking in transaction time, repeated data in the bank fund flow data can be effectively removed, the use requirement of the actual bank fund flow data is met, and the method is safe and reliable.
Drawings
FIG. 1 is a flow chart of the present invention for checking to see if there are duplicates in the receipt and payment format fund flow.
FIG. 2 is a de-duplication flow chart of the present invention.
Detailed Description
The invention is further illustrated by the following specific figures and examples.
As shown in fig. 2: for the bank fund flow data lacking in transaction time, in order to effectively remove repeated data in the bank fund flow data and meet the use requirement of the actual bank fund flow data, the duplication removing method comprises the following steps:
step 1, providing fund flow data in a receipt and payment format, and skipping to step 2 when checking and confirming that the fund flow data in the receipt and payment format is repeated, or skipping to step 8;
specifically, the fund flow data of the receipt and payment format may be obtained by conversion by an existing common technical means, and for any fund flow data, the fund flow data of the receipt and payment format includes a transaction date, a transaction opponent, a transaction amount and a receipt and payment mark, that is, the fund flow data of the receipt and payment format lacks transaction time information.
As shown in fig. 1, a flow chart for checking whether there is duplication in the fund flow data in the receipt and payment format is provided, specifically, the flow chart for checking and confirming includes:
step 1.1, determining the ratio Nx of repeated items in the fund flow data every day according to the transaction date of the fund flow data in the receipt and payment format, and determining the number Dx of days for which repeated data exist according to the ratio Nx of the repeated items every day;
specifically, the daily repeated item ratio Nx is the number of repeated transaction amounts on the same transaction date/the total transaction running water number on the current transaction date, wherein the transaction date information is included in the fund running water data, so that the number of repeated transaction amounts on the same transaction date can be conveniently counted according to the transaction date, and the total transaction running water number on the same transaction date can be directly counted, so that the daily repeated item ratio Nx can be obtained.
And 1.2, when the number Dx of repeated data days is more than 1 or the ratio Nx of repeated items per day is more than the threshold value of the ratio of repeated items, determining that the fund flow data in the receipt and payment format is repeated.
Specifically, the duplicate rate threshold may be selected according to actual needs for confirmation, and generally, the duplicate rate threshold may be set to 20%, that is, when the duplicate rate Nx of each day is greater than 0.2, or when the number Dx of duplicate data days is greater than, it may be directly confirmed that the fund flow data in the receipt and payment format is duplicated, and a subsequent deduplication step is required, otherwise, it may be confirmed that all the fund flow data does not need deduplication, and may be directly used.
Step 2, sorting the fund flow data in the receipt and payment format according to the transaction date, and storing the first-order fund flow data after sorting in the established effective fund flow data set as the current effective fund flow data;
specifically, as can be seen from the above description, due to the existence of the transaction date, the transaction dates can be directly sorted according to the transaction dates, generally, the transaction dates are dates including year-month-day, when sorting, the dates are sorted in an ascending order, and the first-order fund flow data is data for performing the first transaction by a certain bank card, which is well known to those skilled in the art and will not be described herein again.
The established effective fund running water data set can be in the form of a common database and the like, and can be specifically selected according to actual needs, which is not described herein again. When the first capital runoff data is stored in the effective capital runoff data set, the first capital runoff data can be used as the current effective capital runoff data, and the current effective capital runoff data is used as the basis for the subsequent deduplication comparison.
Step 3, comparing the fund flow data which is next to the current effective fund flow data after being sorted with all effective fund flow data in an effective fund flow data set, if the fund flow data is completely consistent with any effective fund flow data, storing the current fund flow data in the established suspected fund flow data set, otherwise, skipping to the step 4;
specifically, when the first-order fund flow data is the current effective fund flow data, the fund flow data next to the current effective fund flow data after sorting is the second-order fund flow data, and the rest is analogized in sequence. The fund flow data in the effective fund flow data set are all effective fund flow data, and the latest deposited fund flow data in the effective fund flow data set is used as the current effective fund flow data.
In the embodiment of the invention, in comparison, the fund flow which is arranged next to the current effective fund flow is compared with all the effective fund flow data, the fund flow data is completely consistent with any one of the effective fund flow data, specifically, the transaction date, the transaction amount, the transaction opponent and the receipt and payment mark of the two are completely the same, and when one of the two is different, the two are considered to be different.
Specifically, when the fund flow data is completely consistent with an effective fund flow data, the data is considered to be suspected repeated data, and at this time, the current fund flow data is stored in a suspected fund flow data set. The specific situation of the suspected fund flow data set may refer to the description of the valid fund flow data set, and is not described herein again.
And (4) when the fund flow data is not completely the same as all the effective fund flow data, skipping to the step 4.
Step 4, calculating the balance after the transaction after the current fund flow data, wherein the balance after the transaction is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data; after the balance after the transaction is obtained through calculation, the current fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data;
specifically, a post-transaction balance is calculated, wherein the post-transaction balance is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data. If the first-place fund flow data is valid fund flow data and the second-place fund flow data is not identical to the first-place fund flow data, the post-transaction balance after the second-place fund flow data needs to be calculated, that is, the sum of the post-transaction balance after the first-place fund flow data and the transaction amount corresponding to the second-place fund flow data, of course, the transaction amount is related to the receipt and payment flag, and when the receipt and payment flag is received, the transaction amount is directly accumulated by numerical values, and when the receipt and payment flag is received, the transaction amount corresponding to the second-place fund flow data is a corresponding negative value, which is known to those skilled in the art specifically, and is not described herein again.
In the embodiment of the invention, after the squeezed balance is obtained through calculation, the current fund flow data is stored in the effective fund flow data and is used as the current effective fund flow data, if the second-bit fund flow data is stored in the effective fund flow data set, the current effective fund flow data is the second-bit fund flow data, and the first-bit fund flow data is only stored in the effective fund flow data set as the effective fund flow data.
In specific implementation, the post-transaction balance is used as a subsequent fund balance, for example, when the second-place fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data, the third-place fund flow data or other subsequent fund flow data is executed in step 4, the fund balance after the current effective fund flow data is the post-transaction balance, and the rest of the conditions are analogized in sequence, which is not illustrated herein one by one.
Step 5, repeating the step 3 and the step 4 until the fund flow data at the end of the sorting is distributed and stored into a suspected fund flow data set or an effective fund flow data set, and obtaining a first-stage end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7;
specifically, there are a plurality of sequenced fund flow data, and the above steps 3 and 4 need to be repeated until the fund flow data at the end of the sequencing is allocated and stored in the suspected fund flow data set or the effective fund flow data set. In the embodiment of the invention, the obtained end balance is the post-transaction balance calculated in the step 4. The final balance is a post-transaction balance calculated by using the transaction amount corresponding to the final fund flow data when the final fund flow data is allocated to the effective fund flow data set, or else, the final balance is a post-transaction balance calculated by using the transaction amount corresponding to the final fund flow data allocated to the effective fund flow data set.
Step 6, deleting the fund flow data in the suspected fund flow data set as repeated fund flow data, and skipping to step 8;
specifically, when the balance at the end of the period is consistent with the balance in the account card, the fund flow data in the suspected fund flow data set is considered to be repeated fund flow data and needs to be deleted, and the step 8 is skipped after deletion. The balance in the account card is the basic information of the account, which can be directly obtained, and is well known to those skilled in the art, and is not described herein again.
Step 7, judging that the fund flow data in the suspected fund flow data set is non-repeated fund flow data, and skipping to step 8;
specifically, when the end-of-term balance is inconsistent with the balance in the account card, the fund flow data in the suspected fund flow data set cannot be directly considered as the repeated fund flow rate data.
In the embodiment of the invention, when the balance at the end of the term is not equal to the balance in the account card, the transaction amount in all the running water fund data in the suspected fund running water data set is accumulated with the balance at the end of the term, if the accumulated sum is equal to the balance in the card, the step 7 is executed, otherwise, the step 6 is skipped.
And 8, ending the duplicate removal.
Specifically, when the duplicate removal is finished, the duplicate removal of all the bank fund flow data lacking transaction time can be completed.

Claims (5)

1. A duplication eliminating method for bank fund flow data is characterized by comprising the following steps:
step 1, providing fund flow data in a receipt and payment format, and skipping to step 2 when checking and confirming that the fund flow data in the receipt and payment format is repeated, or skipping to step 8;
step 2, sorting the fund flow data in the receipt and payment format according to the transaction date, and storing the first-order fund flow data after sorting in the established effective fund flow data set as the current effective fund flow data;
step 3, comparing the fund flow data which is next to the current effective fund flow data after being sorted with all effective fund flow data in an effective fund flow data set, if the fund flow data is completely consistent with any effective fund flow data, storing the current fund flow data in the established suspected fund flow data set, otherwise, skipping to the step 4;
step 4, calculating the balance after the transaction after the current fund flow data, wherein the balance after the transaction is the sum of the fund balance after the current effective fund flow data and the transaction amount corresponding to the current fund flow data; after the balance after the transaction is obtained through calculation, the current fund flow data is stored in the effective fund flow data set and is used as the current effective fund flow data;
step 5, repeating the step 3 and the step 4 until the fund flow data at the end of the sorting is distributed and stored into a suspected fund flow data set or an effective fund flow data set, and obtaining a first-stage end balance; when the end balance is consistent with the balance in the account card, skipping to the step 6, otherwise, skipping to the step 7;
step 6, deleting the fund flow data in the suspected fund flow data set as repeated fund flow data, and skipping to step 8;
step 7, judging that the fund flow data in the suspected fund flow data set is non-repeated fund flow data, and skipping to step 8;
and 8, ending the duplicate removal.
2. The method for removing duplicate bank fund flow data according to claim 1, wherein the step 1 of checking and confirming the existence of duplicate receipt and payment format fund flow data comprises the following steps:
step 1.1, determining the ratio Nx of repeated items in the fund flow data every day according to the transaction date of the fund flow data in the receipt and payment format, and determining the number Dx of days for which repeated data exist according to the ratio Nx of the repeated items every day;
and 1.2, when the number Dx of repeated data days is more than 1 or the ratio Nx of repeated items per day is more than the threshold value of the ratio of repeated items, determining that the fund flow data in the receipt and payment format is repeated.
3. The method for removing duplication of bank fund flow data according to claim 2, wherein the daily duplication ratio Nx is the number of duplicate transaction amounts on the same transaction date/the total transaction flow on the current transaction date.
4. The method for removing duplication of bank fund flow data according to any one of claims 1 to 3, wherein the fund flow data in the receipt and payment format comprises transaction date, transaction opponent, transaction amount and receipt and payment mark.
5. The method for removing the duplication of the bank fund flow data according to any one of claims 1 to 3, wherein in step 5, when the end-of-term balance is not equal to the balance in the account card, the transaction amount in all the flow fund data in the suspected fund flow data set is accumulated with the end-of-term balance, if the accumulated sum is equal to the balance in the card, step 7 is executed, otherwise, step 6 is skipped.
CN202111429072.4A 2021-11-29 2021-11-29 Method for removing duplication of bank funds flow data Active CN114218201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111429072.4A CN114218201B (en) 2021-11-29 2021-11-29 Method for removing duplication of bank funds flow data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111429072.4A CN114218201B (en) 2021-11-29 2021-11-29 Method for removing duplication of bank funds flow data

Publications (2)

Publication Number Publication Date
CN114218201A true CN114218201A (en) 2022-03-22
CN114218201B CN114218201B (en) 2024-10-15

Family

ID=80698720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111429072.4A Active CN114218201B (en) 2021-11-29 2021-11-29 Method for removing duplication of bank funds flow data

Country Status (1)

Country Link
CN (1) CN114218201B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306088A1 (en) * 2008-08-01 2010-12-02 Hantz Group, Inc. Single or multi-company business accounting system and method for same including account number maintenance
CN112581268A (en) * 2019-09-30 2021-03-30 北京宸瑞科技股份有限公司 Mass fund transaction data intelligence analysis method and system
US11094020B1 (en) * 2018-02-01 2021-08-17 Dailypay, Inc. Methods and apparatus for constructing machine learning models to process user data and provide advance access to payments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306088A1 (en) * 2008-08-01 2010-12-02 Hantz Group, Inc. Single or multi-company business accounting system and method for same including account number maintenance
US11094020B1 (en) * 2018-02-01 2021-08-17 Dailypay, Inc. Methods and apparatus for constructing machine learning models to process user data and provide advance access to payments
CN112581268A (en) * 2019-09-30 2021-03-30 北京宸瑞科技股份有限公司 Mass fund transaction data intelligence analysis method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴志涛: "电信大客户网管告警数据分析平台的开发与应用", 中国优秀硕士学位论文全文数据库 (信息科技辑), 15 May 2021 (2021-05-15), pages 138 - 626 *
王慧;: "零余额账户管理的问题及建议", 金融科技时代, no. 12, 10 December 2014 (2014-12-10), pages 78 - 79 *

Also Published As

Publication number Publication date
CN114218201B (en) 2024-10-15

Similar Documents

Publication Publication Date Title
CN109598095B (en) Method and device for establishing scoring card model, computer equipment and storage medium
CN113626527B (en) A financial data processing method and system
CN108596396B (en) Road surface performance prediction and maintenance method and device based on maintenance history correction
CN112734219B (en) Vehicle transportation running behavior analysis method and system
EP3038025A1 (en) Retention risk determiner
EP2357857B1 (en) Method and apparatus for generating phone bill
CN111061709A (en) An automatic method and system for data cleaning of similar repeated records
CN114647684A (en) Traffic prediction method and device based on stacking algorithm and related equipment
CN114218201A (en) Method for removing duplication of bank fund flow data
CN112213579A (en) Method and device for identifying faults of turnout switch machine
CN112527922B (en) Data warehouse incremental processing method based on invariant model
CN115422464A (en) Method and device for determining number of persons participating in sequence event and storage medium
CN117235062B (en) Service system data modeling method based on data center
CN107783896B (en) Optimization method and device of data processing model
CN116775632A (en) Near-real-time cleaning data execution method based on vehicle-mounted terminal acquisition data
CN112632953B (en) Method for rapidly and accurately detecting that multiple uploaded bill of materials belongs to same product
CN111461617B (en) Inventory counting method and device, computer equipment and storage medium
CN109685453B (en) Method for intelligently identifying effective paths of workflow
CN116501740A (en) Method, device, equipment and storage medium for creating database table partitions
TWI628607B (en) Accounting processing notice billing re-validation method and system
CN114399362A (en) Expert extraction encryption method for integrating government procurement expert credit rating
CN110222078B (en) Data processing method and device
CN113138980A (en) Data processing method, device, terminal and storage medium
JP2019109840A (en) Parts trace management system and parts trace management method
CN114564521A (en) Method and system for determining working time period of agricultural machine based on clustering algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant