Disclosure of Invention
The invention provides a bit currency address mining method and device for a digital currency exchange, which are used for solving the technical problems that the existing bit currency exchange address identification and classification method depends on more resources, manual data labeling is needed, the process of feature extraction is complex, and the method cannot be used in a large range.
The invention provides a bit currency address mining method of a digital currency exchange, which comprises the following steps:
acquiring transaction information on a bitcoin blockchain, and acquiring a plurality of initial addresses according to the transaction information; the initial address is the address with the most bitcoins;
constructing a bitcoin transfer network for each of the initial addresses based on the transaction information;
clustering addresses in the bitcoin transfer network based on a preset heuristic rule to obtain an address set of each initial address; wherein the addresses in the address set belong to the same digital currency exchange;
training an address classification model, and classifying the bit currency addresses in the address set through the address classification model to obtain the category of the bit currency addresses in the address set.
Optionally, the step of constructing a bitcoin transfer network for each of the initial addresses based on the transaction information includes:
and constructing the bitcoin transfer network of each initial address by adopting a breadth-first search algorithm based on the transaction information.
Optionally, the preset heuristic rule includes: address clustering heuristic based on multiple input transactions or address clustering heuristic based on change addresses.
Optionally, the step of training the address classification model, classifying the bitcoin addresses in the address set by the address classification model, and obtaining the category of the bitcoin addresses in the address set includes:
extracting historical transaction records of bitcoin addresses in the address set, and constructing training characteristics according to the historical transaction records;
carrying out normalization processing on the training features to obtain normalized features;
training a preset initial address classification model by adopting the normalized features to obtain an address classification model;
and classifying the bitcoin addresses in the address set through the address classification model to obtain the category of the bitcoin addresses in the address set.
The invention also provides a bit currency address mining device of the digital currency exchange, which comprises:
the initial address acquisition module is used for acquiring transaction information on the bitcoin blockchain and acquiring a plurality of initial addresses according to the transaction information; the initial address is the address with the most bitcoins;
the bit coin transfer network construction module is used for constructing a bit coin transfer network of each initial address based on the transaction information;
the clustering module is used for clustering the addresses in the bitcoin transfer network based on a preset heuristic rule to obtain an address set of each initial address; wherein the addresses in the address set belong to the same digital currency exchange;
and the classification module is used for training an address classification model, and classifying the bit currency addresses in the address set through the address classification model to obtain the category of the bit currency addresses in the address set.
Optionally, the bitcoin transfer network constructing module includes:
and the bit coin transfer network construction submodule is used for constructing the bit coin transfer network of each initial address by adopting an breadth-first search algorithm based on the transaction information.
Optionally, the preset heuristic rule includes: address clustering heuristic based on multiple input transactions or address clustering heuristic based on change addresses.
Optionally, the classification module includes:
the training feature construction submodule is used for extracting the historical transaction records of the bitcoin addresses in the address set and constructing training features according to the historical transaction records;
the normalization submodule is used for performing normalization processing on the training characteristics to obtain normalized characteristics;
the address classification model generation submodule is used for training a preset initial address classification model by adopting the normalized features to obtain an address classification model;
and the classification submodule is used for classifying the bitcoin addresses in the address set through the address classification model to obtain the category of the bitcoin addresses in the address set.
The invention also provides an electronic device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the bitcoin address mining method of a digital money exchange as described in any one of the above in accordance with instructions in the program code.
The present invention also provides a computer readable storage medium characterized in that the computer readable storage medium is used for storing program code for performing the bit currency address mining method of a digital currency exchange as described in any one of the above.
According to the technical scheme, the invention has the following advantages: the method comprises the steps of obtaining transaction information on a bitcoin blockchain, and obtaining a plurality of initial addresses according to the transaction information; constructing a bitcoin transfer network of each initial address based on the transaction information; clustering addresses in the bitcoin transfer network based on a preset heuristic rule to obtain an address set of each initial address; and classifying the bit currency addresses in the address set through a preset address classification model to obtain the category of the bit currency addresses in the address set. Compared with the prior art, the method has the advantages of less dependence on resources, no need of manual data annotation, simple process of feature extraction and wide application range.
Detailed Description
The embodiment of the invention provides a bit currency address mining method and device for a digital currency exchange, which are used for solving the technical problems that the existing bit currency exchange address identification and classification method depends on more resources, manual data labeling is needed, the process of feature extraction is complex, and the method cannot be used in a large range.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a bitcoin address mining method for a digital currency exchange according to an embodiment of the present invention.
The invention provides a bit currency address mining method of a digital currency exchange, which specifically comprises the following steps:
step 101, acquiring transaction information on a bitcoin blockchain, and acquiring a plurality of initial addresses according to the transaction information; the initial address is the address with the most bit coins;
102, constructing a bitcoin transfer network of each initial address based on transaction information;
103, clustering addresses in the bitcoin transfer network based on a preset heuristic rule to obtain an address set of each initial address; wherein, the addresses in the address set belong to the same digital currency exchange;
and 104, training an address classification model, and classifying the bit currency addresses in the address set through the address classification model to obtain the category of the bit currency addresses in the address set.
The method comprises the steps of acquiring transaction information on a bitcoin blockchain, and acquiring a plurality of initial addresses according to the transaction information; constructing a bitcoin transfer network of each initial address based on the transaction information; clustering addresses in the bitcoin transfer network based on a preset initiation rule to obtain an address set of each initial address; and classifying the bit currency addresses in the address set through a preset address classification model to obtain the category of the bit currency addresses in the address set. Compared with the prior art, the method has the advantages of less dependence on resources, no need of manual data marking, simple process of feature extraction and wide application range.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a method for mining bitcoin addresses of digital currency exchanges according to another embodiment of the present invention. The method may specifically comprise the steps of:
step 201, acquiring transaction information on a bitcoin blockchain, and acquiring a plurality of initial addresses according to the transaction information; the initial address is the address with the most bit coins;
for exchanges, it is imperative that a large number of bits be stored using addresses, referred to as cold wallet addresses. A cold wallet address can easily find a combination (but a hot wallet address or a top-up wallet address is difficult to find because many are not used and are difficult to determine), and a candidate set containing the cold wallet address can be found from the K addresses with the largest number of bitcoin.
Step 202, based on transaction information, adopting a breadth-first search algorithm to construct a bitcoin transfer network of each initial address;
breadth-first search algorithm (BFS): it is a blind search method, which aims to systematically expand and check all nodes to find the result.
For a cold wallet address containing a large number of bitcoins, the primary task is to determine the source of the bitcoin, and thus to mine more information and find the address associated with this cold wallet address, i.e. the set of addresses controlled by the same entity. The specific operation is as follows:
constructing a bitcoin transfer network based on single transaction:
in the bitcoin transfer network, bitcoin addresses are used as nodes, and bitcoin transfer relations between input addresses and output addresses in transactions are used as edges. Because the bit currency transaction data only contains the number of bit currencies transferred in/out by one address and the specific number of the bit currencies flowing from each input address to each output address is not given, an edge exists between each input address and each output address in one transaction. Specifically, as shown in fig. 3, A, B, C, D is a node, and a, b, c, and d are edges.
Based on the principle of constructing the bitcoin transfer network by the single transaction, the bit coin transfer network is constructed for each initial address by adopting a breadth-first search algorithm. The input of the algorithm is as follows:
a: the starting point of the bit currency transfer network to be constructed is one of K initial addresses with the maximum bit currency quantity;
t: the initial address A is a transaction set of all received bitcoins;
n: depth used to control BFS search:
p: for controlling the conditions of pruning.
The output of the algorithm is: and the bitcoin transfer network G corresponding to the initial address A.
The construction process comprises the following steps: for the starting point a, the transaction set T is traversed, and the transaction corresponding to each input in each transaction (i.e. from which transaction the UTXO corresponding to this input comes) is added to the set S; in the process of each round of BFS search, elements in the queue Q are taken out one by one, when the queue Q is empty, all transactions in the set S are pressed into the queue Q, and the next round of BFS search is carried out; and when the search depth of the BFS search reaches the maximum depth N, ending the process.
Pruning: when the input contains too few bits during the traversal of each input of a transaction, the BFS search is not expanded and searched during the BFS search due to the time complexity.
In one example, the pseudo code for implementation of the algorithm may be as follows:
step 203, clustering addresses in the bitcoin transfer network based on a preset heuristic rule to obtain an address set of each initial address; wherein, the addresses in the address set belong to the same digital currency exchange;
in the bitcoin transfer network constructed in step 202, not all nodes (i.e., addresses) are controlled by the same entity. Therefore, it is necessary to filter the address set controlled by the same entity from the initial address a. In the embodiment of the invention, the clustering of the addresses can be realized by address clustering heuristic rules based on multi-input transaction or address clustering heuristic rules based on change-giving addresses.
And 204, training an address classification model, and classifying the bit currency addresses in the address set through the address classification model to obtain the category of the bit currency addresses in the address set.
After clustering, the addresses of the initial address a in the same cluster need to be further classified. The wallet framework of the exchange is divided into a cold wallet, a hot wallet and a recharging wallet. Of which the cold and hot wallets are of the most importance to the exchange and therefore there is a need to further distinguish to which type of wallet the addresses in the cluster belong.
In an embodiment of the present invention, step 204 may include the following sub-steps:
extracting historical transaction records of the bitcoin addresses in the address set, and constructing training characteristics according to the historical transaction records;
carrying out normalization processing on the training characteristics to obtain normalized characteristics;
training a preset initial address classification model by using the normalized features to obtain an address classification model;
and classifying the bit currency addresses in the address set through an address classification model to obtain the category of the bit currency addresses in the address set.
In practical application, the training process for the address classification model is as follows:
firstly, an address set V in a cluster obtained by clustering is given, and the historical transaction record set of each address in the cluster is TaWherein a ∈ V. X is formed by R|V|×dIs the feature space constructed from the historical transaction records for each bitcoin address in V, d is the size of the feature space for each feature vector, and R is the real number set. Y is a set of tags whose values may include 0 (cold wallet address), 1 (hot wallet address), 2 (top up wallet address). The training of the address classification model is to learn a mapping function f from the feature space to the label set: x → Y.
The features used by the address classification model include:
the current balance of the address;
the total number of bitcoin received and sent by the address;
the total number of bit coins received by the address;
the total number of the transmitted bitcoins of the address;
the total transaction number of addresses;
the transaction amount of the total received bitcoin of the address;
the transaction number of the total sending bitcoin of the address;
UTXO number of addresses.
Due to the large scale and large difference between the trading volume of different exchanges, the above used features need to be normalized in order to make the trained address classification model have better universality. The procedure for normalization is as follows:
for each feature x, its normalized value x' is:
wherein x ismin,vIs the minimum value of the characteristic x of each address in the address set V of the cluster, xmax,vIs the maximum value of the characteristic x of each address in the set of addresses V for that cluster. After normalization, the range of x' is [0,1 ]]。
In the embodiment of the invention, the training of the address classification model can be realized by adopting a CART classification tree model, and the training method comprises the following steps:
the input of the algorithm is a training set x (namely the characteristic space of the structure), a threshold value g of a Gini coefficient and a threshold value t of the number of samples; the output is a decision tree T, the process is as follows:
1) if the sample set of the current node is D, if the number of the samples in D is less than a threshold value t or no characteristic exists, the current node is not split any more, and the current decision tree sub-tree is returned;
2) calculating the kini coefficient of the sample set D, if the kini coefficient is smaller than a threshold value g, the current node is not split, and the current decision tree subtree is returned;
3) calculating the Gini coefficient of each characteristic value pair set D of each existing characteristic of the current node, selecting the characteristic B with the minimum Gini coefficient and the corresponding characteristic value B, dividing the sample set D into two parts D1 and D2 according to the optimal characteristic and the optimal characteristic value, and splitting the current node at the same time, wherein the sample set of the left node is D1, and the sample set of the right node is D2;
4) calling the left and right child nodes in a recursive manner in steps 1) -3) to generate a decision tree;
5) generating decision trees with various pruning effects based on the decision trees generated in the steps 1) -4), then testing the various pruning effects by using cross validation, and selecting the pruned decision tree with the best generalization prediction capability as a final CART tree.
After the address classification model is obtained through training, the address classification model is adopted to classify the bit currency addresses in the address set, and the address category to which each bit currency belongs can be determined. Wherein the address categories include a cold wallet address, a hot wallet address, and a top-up wallet address.
The method comprises the steps of acquiring transaction information on a bitcoin blockchain, and acquiring a plurality of initial addresses according to the transaction information; constructing a bitcoin transfer network of each initial address based on the transaction information; clustering addresses in the bitcoin transfer network based on a preset initiation rule to obtain an address set of each initial address; and classifying the bit currency addresses in the address set through a preset address classification model to obtain the category of the bit currency addresses in the address set. Compared with the prior art, the method has the advantages of less dependence on resources, no need of manual data marking, simple process of feature extraction and wide application range.
Referring to fig. 4, fig. 4 is a block diagram illustrating a bitcoin address mining apparatus of a digital currency exchange according to an embodiment of the present invention.
The invention provides a bit currency address mining device of a digital currency exchange, which specifically comprises the following steps:
an initial address obtaining module 401, configured to obtain transaction information on a bitcoin blockchain, and obtain multiple initial addresses according to the transaction information; the initial address is the address with the most bit coins;
a bitcoin transfer network construction module 402, configured to construct a bitcoin transfer network for each initial address based on the transaction information;
a clustering module 403, configured to cluster addresses in the bitcoin transfer network based on a preset heuristic rule to obtain an address set of each initial address; wherein the addresses in the address set belong to the same digital currency exchange;
the classification module 404 is configured to train an address classification model, and classify the bit currency addresses in the address set through the address classification model to obtain the category of the bit currency addresses in the address set.
In this embodiment of the present invention, the bitcoin transfer network constructing module 402 includes:
and the bit coin transfer network construction submodule is used for constructing the bit coin transfer network of each initial address by adopting a breadth-first search algorithm based on the transaction information.
In the embodiment of the present invention, the presetting of the heuristic rule includes: address clustering heuristic based on multiple input transactions or address clustering heuristic based on change addresses.
In an embodiment of the present invention, the classification module 404 includes:
the training characteristic construction submodule is used for extracting the historical transaction records of the bitcoin addresses in the address set and constructing training characteristics according to the historical transaction records;
the normalization submodule is used for performing normalization processing on the training characteristics to obtain normalized characteristics;
the address classification model generation submodule is used for training a preset initial address classification model by adopting the normalized features to obtain an address classification model;
and the classification submodule is used for classifying the bit currency addresses in the address set through the address classification model to obtain the classes of the bit currency addresses in the address set.
The invention also provides an electronic device comprising a processor and a memory:
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is configured to perform the bitcoin address mining method of the digital currency exchange of an embodiment of the present invention according to instructions in the program code.
The present invention also provides a computer-readable storage medium for storing program code for performing the bitcoin address mining method of the digital money exchange of the embodiments of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.