CN109271390B

CN109271390B - A neural network-based index data structure and its data retrieval method

Info

Publication number: CN109271390B
Application number: CN201811152887.0A
Authority: CN
Inventors: 闫柳; 李卓; 刘开华
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2022-03-01
Anticipated expiration: 2038-09-30
Also published as: CN109271390A

Abstract

The invention discloses an index data structure based on a neural network, including a data mapping unit and a dynamic indexing unit; the data mapping unit is a data mapping unit based on a neural network model, collects name data in a named data network as a sample, and calculates the cumulative The distribution function value is used as a label, and the neural network model is obtained by training, which is used to map the name data to the corresponding position in the improved bitmap data structure; the dynamic index unit is a dynamic index unit based on the improved bitmap data structure. The bitmap is evenly divided into several buckets, and the size of each slot in the bucket is expanded to obtain an improved bitmap data structure that can be dynamically labeled, which is used to store the address offset corresponding to the name data. Aiming at the characteristics of the forwarding plane of the named data network, the invention can improve the storage efficiency and realize the fast data insertion and retrieval operations under the condition of ensuring the retrieval speed and the probability of misjudgment.

Description

Index data structure based on neural network and data retrieval method thereof

Technical Field

The invention belongs to the field of high-performance router index Data structure design, and particularly aims at the problems of efficient storage and quick retrieval of index contents in a Named Data Networking (Named Data Networking) forwarding plane.

Background

The continuous emergence of innovative applications, technologies and computing modes such as ultra-high definition videos, artificial intelligence and cloud computing accelerates the role transition of the internet from a communication channel to a data processing platform. The defects of the existing IP internet system structure based on the equipment address in the aspects of information sharing, mobility, safety, expandability and the like become a great difficult problem for hindering the development of the internet. Therefore, a new future internet architecture, named data network, was proposed in 2010 and has gained wide attention from both national and foreign academic circles.

Named data networks use a fully data content oriented communication mode with data names instead of IP addresses. The method has the advantages that the buffer memory is arranged in the routing node, so that the data content sharing in the real sense is realized, the network load is greatly reduced, and the network data transmission rate is effectively improved. It is therefore considered one of the most promising developments in the field of future internet architectures.

However, named data networks also face a series of challenges to be solved, especially for efficient storage and fast retrieval of indexed content in the forwarding plane. The index data structure is the key for improving the performance of the forwarding plane, but the current main research results have advantages and disadvantages. For example, data structure lookup based on a dictionary tree is slow; data structures based on bloom filters cannot directly index data; whereas hash table based data structures require a very large memory space. Therefore, the current research results cannot meet the requirements of the named data network forwarding plane on the retrieval speed and the storage space at the same time, and a new research idea needs to be provided urgently to design an index data structure with excellent comprehensive performance and a data retrieval algorithm thereof so as to solve the problem.

Disclosure of Invention

In order to solve the problems, the invention provides an index data structure based on a neural network and a data retrieval method thereof. Aiming at the characteristics of a forwarding plane of a named data network, the invention can improve the storage efficiency and realize rapid data insertion and retrieval operation under the condition of ensuring the retrieval speed and the misjudgment probability.

In order to solve the technical problem, the invention provides an index data structure based on a neural network, which comprises a data mapping unit and a dynamic index unit; the data mapping unit is a data mapping unit based on a neural network model, name data in a named data network are collected as samples, cumulative distribution function values of the name data are calculated to serve as labels, a back propagation neural network is trained according to the samples and the labels, and the neural network model is obtained and used for mapping the name data to be retrieved to corresponding positions in an improved bitmap data structure; the dynamic index unit is based on the dynamic index unit of the improved bitmap data structure, the traditional bitmap data structure is averagely divided into a plurality of barrels, the size of each groove in each barrel is expanded, and the improved bitmap data structure with dynamic labels is obtained and is marked as D-bitmap and used for storing the address offset corresponding to the name data to be retrieved.

The invention also provides a data retrieval method of the index data structure based on the neural network, which mainly comprises the steps of inserting data into the index data structure and carrying out data retrieval on the index data structure after the data is inserted; the method comprises the following specific steps:

step one, inserting name data in the index data structure, wherein each time one name data is inserted, the method comprises the following steps:

step 1-1: inputting name data: inputting name data to be inserted into the index data structure;

step 1-2: calculating the neural network of the data mapping unit: inputting the name data after fixed length processing into the neural network model for operation to obtain a real numerical value ranging from 0 to 1;

step 1-3: position mapping calculation of the data mapping unit: multiplying the calculation result of the neural network by the total number of the grooves of the D-bitmap to obtain the position of the name data mapped on the D-bitmap, namely the groove sequence number of the D-bitmap;

step 1-4: and calculating the bucket sequence number of the dynamic index unit: dividing the number of the grooves of each barrel by the number of the grooves of each barrel, and rounding downwards to obtain the barrel number of the position;

step 1-5: maximum index lookup of dynamic index units: traversing all the grooves in the barrel according to the barrel serial numbers obtained in the step 1-4, and searching the existing maximum label;

step 1-6: in-slot index entry for dynamic index unit: judging whether the existing maximum label found in the steps 1-5 reaches the maximum value, if not, recording the label of the existing maximum label plus one in the position groove, otherwise, taking the earliest deleted label from the deletion queue as the label in the position groove;

step 1-7: applying for a storage unit for the named data: applying a storage unit for the name data in a data storage space according to the base address corresponding to the barrel serial number in the dynamic index unit and the address offset represented by the in-slot label;

step 1-8: the name data insertion operation is finished;

step two, retrieving one name data in an index data structure, comprising the following steps:

step 2-1: inputting name data: inputting name data to be inserted into the index data structure;

step 2-2: calculating the neural network of the data mapping unit: inputting the name data after fixed length processing into the neural network model for operation to obtain a real numerical value ranging from 0 to 1;

step 2-3: position mapping calculation of the data mapping unit: multiplying the calculation result of the neural network by the total number of the grooves of the D-bitmap to obtain the position of the name data mapped on the D-bitmap, namely the groove sequence number of the D-bitmap;

step 2-4: judging the existence of data of the dynamic index unit: judging whether the label at the position is 0 or not, if not, the name data exists in the index data structure, and continuing to execute the step 2-5 to finish the retrieval; otherwise, the name data does not exist in the index data structure, namely, the name data indicates that no retrieval result exists, and the step 2-7 is executed;

step 2-5: and calculating the bucket sequence number of the dynamic index unit: dividing the number of the grooves of each barrel by the number of the grooves of each barrel, and rounding downwards to obtain the barrel number of the position;

step 2-6: and outputting a retrieval result: outputting the barrel serial number and the in-slot label of the position, namely the base address of the data storage unit and the address offset of the name data relative to the base address in the data storage space;

step 2-7: the name data retrieval operation ends.

Compared with the prior art, the invention has the beneficial effects that:

the indexing data structure based on the neural network and the data retrieval method thereof are deployed and realized on a computer configured as Intel Xeon E5-1650 v23.50GHz and DDR 324 GB SDRAM. The name data for training can be collected through a forwarding plane of a named data network in an actual environment, and in a laboratory environment, one hundred million pieces of name data with similar formats are constructed by using common English words and top-level domain names, and a neural network model capable of realizing uniform mapping is obtained through training. In the performance test, considering that the data volume processed by a named data network forwarding plane is in the million level, one million pieces of name data with the same distribution rule are used for testing the performance of the index data structure and the data retrieval method thereof. Experimental results show that the storage consumption of the method is far less than that of the Hash index, the retrieval speed is equivalent to that of the Hash index, and the communication requirement that the misjudgment rate of the current Internet is lower than 1% can be met. Therefore, the index data structure based on the neural network and the data retrieval method thereof, which are designed by the invention, can improve the storage efficiency under the condition of ensuring the retrieval speed and the misjudgment probability, and have good performance.

Drawings

FIG. 1 is a schematic diagram of an indexing data structure based on a neural network according to the present invention;

FIG. 2 is a diagram illustrating the specific steps of the data mapping unit in the neural network-based index data structure according to the present invention;

FIG. 3 is a block diagram of a data insertion operation in the data retrieval method of the present invention;

FIG. 4 is a block diagram of a data retrieval method according to the present invention.

Detailed Description

The technical solutions of the present invention are further described in detail with reference to the accompanying drawings and specific embodiments, which are only illustrative of the present invention and are not intended to limit the present invention.

As shown in fig. 1, the present invention provides an index data structure based on a neural network, which includes a data mapping unit and a dynamic index unit; the concrete description is as follows:

the data mapping unit is a data mapping unit based on a Neural Network model, name data in a named data Network are collected as samples, cumulative distribution function values of the name data are calculated as labels, a Back Propagation Neural Network (BPNN) is trained according to the samples and the labels, and the Neural Network model is obtained and used for mapping the name data to be retrieved to corresponding positions in an improved bitmap data structure. The collection of the name data in the named data network is realized in a forwarding plane in the actual deployment of the named data network, and a sufficient amount of sample data is obtained by recording all the name data processed by the forwarding plane within a certain time. The reason why the cumulative distribution function value thereof is calculated as a label is that the distribution of values of data subjected to the transformation of the cumulative distribution function, which is subjected to arbitrary distribution, is subjected to uniform distribution between 0 and 1. Therefore, the collected name data is used as a sample, and the cumulative distribution function value is calculated to be used as a label, so that the uniform mapping of the data can be realized, and the utilization efficiency of the storage unit is optimized.

The specific steps of the data mapping unit are shown in fig. 2. In the training stage, the name data serving as a sample and the cumulative distribution function value serving as a label jointly form a training data set, the training data set is used for training the back propagation neural network BPNN, the cumulative distribution function of the training data set is fitted, the relevant parameters such as the weight and the offset of the hidden layer and the output layer are obtained, and a neural network model capable of achieving uniform mapping is formed. And in the actual prediction stage after training, inputting a name data, and multiplying the value obtained by the operation of the neural network model by the mapping size to obtain the mapping position of the name data.

The dynamic index unit is based on the dynamic index unit of the improved bitmap data structure, the traditional bitmap data structure is averagely divided into a plurality of barrels, the size of each groove in each barrel is expanded, and the improved bitmap data structure with dynamic labels is obtained and is marked as D-bitmap and used for storing the address offset corresponding to the name data to be retrieved. The specific method for improving the traditional bitmap data structure is that the bitmap with the size of m slots is averagely divided into n buckets, each bucket contains m/n slots and corresponds to a data storage space with a fixed base address. The size of each slot is expanded from the original 1 bit to j bits for recording the address offset. The improved D-bitmap dynamically marks the labels in the data insertion process, and the label in each slot represents the sequence of the name data entering the bucket, namely the address offset of the name data in the storage space. And applying for a storage unit for the name data according to the base address corresponding to the bucket and the address offset represented by the mark in the slot, thereby realizing the dynamic address allocation of the data storage unit.

The data retrieval method aiming at the index data structure based on the neural network comprises the steps of inserting data into the index data structure and carrying out data retrieval on the index data structure after the data is inserted; the method comprises the following specific steps:

step one, inserting name data in the index data structure, and as shown in fig. 3, each time inserting a name data, including the following steps:

step 1-8: the name data insertion operation is finished;

in the present invention, an example of name data insertion into an index data structure is shown in fig. 1. In the index data structure, a data mapping unit is a trained neural network model, and a dynamic index unit is an improved bitmap data structure D-bitmap. The number of the slots is set to be 32, and each 16 slots are divided into one bucket, so that the D-bitmap is divided into 2 buckets in total, and the D-bitmap corresponds to 2 data storage spaces with fixed base addresses. Currently the index data structure has two name data inserted. And when the third name data/ndn/name/3 rd is inserted, performing neural network calculation and position mapping calculation in the data mapping unit to obtain the mapping position of the name data in the D-bitmap as the 6 th slot. And entering a dynamic index unit, and calculating the barrel serial number of the position to be 1. The current maximum label in the 1 st bucket is searched, the search result is 1, and the maximum value is not reached yet, so that the label of the current maximum label plus one is marked in the 6 th slot and is 2, that is, the address offset of the name data in the data storage unit relative to the 1 st base address.

Step two, retrieving a name data in the index data structure, as shown in fig. 4, including the following steps:

step 2-7: the name data retrieval operation ends.

Example (b): the indexing data structure based on the neural network and the data retrieval method thereof are deployed and realized on a computer configured as Intel Xeon E5-1650 v23.50GHz and DDR 324 GB SDRAM. The name data for training can be collected through a forwarding plane of a named data network in an actual environment, and in a laboratory environment, one hundred million pieces of name data with similar formats are constructed by using common English words and top-level domain names, and a neural network model capable of realizing uniform mapping is obtained through training. In the performance test, considering that the data volume processed by a named data network forwarding plane is in the million level, one million pieces of name data with the same distribution rule are used for testing the performance of the index data structure and the data retrieval method thereof. Experimental results show that the storage consumption of the method is far less than that of the Hash index, the retrieval speed is equivalent to that of the Hash index, and the communication requirement that the misjudgment rate of the current Internet is lower than 1% can be met. Therefore, the index data structure based on the neural network and the data retrieval method thereof, which are designed by the invention, can improve the storage efficiency under the condition of ensuring the retrieval speed and the misjudgment probability, and have good performance.

While the present invention has been described with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments, which are illustrative only and not restrictive, and various modifications which do not depart from the spirit of the present invention and which are intended to be covered by the claims of the present invention may be made by those skilled in the art.

Claims

1. A data retrieval method based on the index data structure of the neural network, the index data structure adopted includes data mapping unit and dynamic index unit; the data mapping unit is a data mapping unit based on a neural network model, name data in a named data network are collected as samples, cumulative distribution function values of the name data are calculated to serve as labels, a back propagation neural network is trained according to the samples and the labels, and the neural network model is obtained and used for mapping the name data to be retrieved to corresponding positions in an improved bitmap data structure; the dynamic index unit is based on the dynamic index unit of the improved bitmap data structure, averagely divides the traditional bitmap data structure into a plurality of barrels, expands the size of each groove in each barrel to obtain the improved bitmap data structure with dynamic labels, and records the improved bitmap data structure as D-bitmap for storing the address offset corresponding to the name data to be retrieved;

the data retrieval method is characterized by comprising the steps of inserting data into an index data structure, and retrieving the data of the index data structure after the data are inserted; the method comprises the following specific steps:

step 1-8: the name data insertion operation is finished;

step 2-7: the name data retrieval operation ends.