[go: up one dir, main page]

CN116521803B - Track similarity query method and device based on data federation - Google Patents

Track similarity query method and device based on data federation

Info

Publication number
CN116521803B
CN116521803B CN202310222854.3A CN202310222854A CN116521803B CN 116521803 B CN116521803 B CN 116521803B CN 202310222854 A CN202310222854 A CN 202310222854A CN 116521803 B CN116521803 B CN 116521803B
Authority
CN
China
Prior art keywords
similarity
trajectory
grid
data
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310222854.3A
Other languages
Chinese (zh)
Other versions
CN116521803A (en
Inventor
彭智勇
吴晨
王胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202310222854.3A priority Critical patent/CN116521803B/en
Publication of CN116521803A publication Critical patent/CN116521803A/en
Application granted granted Critical
Publication of CN116521803B publication Critical patent/CN116521803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a track similarity query method and device based on a data federation, and relates to the field of track data mining and information retrieval, wherein the query method comprises the steps of numbering mobile equipment in the data federation; based on the federal index and the number, searching the preset number of mobile devices most similar to the track to be queried by using a dynamic branch reduction algorithm. The method and the system construct the federal index by utilizing the space-time characteristics of the track, filter out the mobile equipment which does not meet the query condition by pruning the upper boundary of the similarity, reduce the communication expense, prune the track data to be queried based on the space-time characteristics of the track data, reduce the expense of local calculation, and terminate the query flow in advance by the central server according to the dynamic pruning condition, thereby further improving the query efficiency.

Description

Track similarity query method and device based on data federation
Technical Field
The invention relates to the field of track data mining and information retrieval, in particular to a track similarity query method and device based on data federation.
Background
Track similarity query is an important means for space-time track data analysis, and track data with the most similarity with a target track is quickly searched through predefined track similarity. The similarity query has important significance for subsequent track data analysis, including track clustering, track recommendation and the like. In addition, the track similarity query is also widely applied to the fields of traffic logistics (path planning), sociology and the like. The track similarity query has important application value.
Data federation is a well-established method of data integration. In the unified data model, data owners within the federation hold a portion of data that interact with a central server and provide query services to the outside. In federal query, a central server in the federal receives a query request from a user and forwards the query request to a related data owner, the data owner processes the query request and returns a query result to the central server, and the central server collates the query result and returns a final result to the user. In the process, the user does not directly interact with the data owners in the federation, so that the data privacy of the data owners is protected to a certain extent.
The traditional data federation is mainly based on enterprise-level application, is mainly applied to server-side programs, and has a large data scale. With the development of mobile technology, the computing power and capacity of terminal devices such as mobile phones are gradually improved, the managed data volume is also gradually increased, and on the other hand, the application of positioning technology on mobile phones also enables daily track data of people to be collected and managed. The track data are analyzed, and particularly the track similarity query can be applied to the fields of infectious disease contact tracing, traffic path planning and the like, so that the social treatment level is improved. However, daily trajectory data often involves personal privacy, and it is difficult for researchers to directly acquire and analyze the data, which is also a difficulty in analysis of movement trajectory data.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a track similarity query method and device based on data federation, mobile equipment which does not meet query conditions is filtered through similarity upper bound pruning, communication expenditure is reduced, meanwhile, track data to be queried is pruned based on space-time characteristics of the track data, the cost of local calculation is reduced, a central server side can terminate a query flow in advance according to dynamic pruning conditions, and query efficiency is further improved.
In order to achieve the above purpose, the invention adopts the following technical scheme:
numbering mobile devices in the data federation;
Constructing a federal index by a central server and based on a spatial grid;
Based on the federal index and the number, searching a preset number of mobile devices most similar to the track to be queried by using a dynamic branch reduction algorithm.
Based on the technical scheme, the mobile equipment in the data federation is numbered, and the specific steps comprise:
The mobile device sends registration information to the central server, and the central server numbers all the mobile devices in sequence according to the sequence of the received information.
On the basis of the technical scheme, the federal index is constructed by the central server based on the space grid, and the specific steps comprise:
The central server sends the space grid parameters to all the mobile devices;
After receiving the grid parameters, the mobile equipment anonymizes the space positions in the track data by using the space grids, and sends the anonymized track information to the central server;
The central server receives anonymized trajectory information, builds a time index over each pane of the spatial grid to record the dwell information of each device within that pane.
On the basis of the technical scheme, the space grid parameters comprise the space range of the grid and the side length of the grid inner square.
Based on the technical scheme, the method for searching the preset number of mobile devices most similar to the track to be queried by using a dynamic branch reduction algorithm based on the federal index and the number comprises the following specific steps:
the central server initializes a priority queue of a similarity upper bound, the size of which is the total number of mobile devices in the data federation, so as to record the similarity upper bound of the data track and the track to be queried in each mobile device and arrange the data tracks from large to small;
The central server calculates the upper limit of the similarity of the track data of the mobile equipment by using the federal index, and adds all the upper limit values of the similarity into the priority queue;
removing the equipment with the upper limit value of similarity of 0 in the priority queue, and locally calculating the similarity value between the rest mobile equipment and the track to be queried by the following calculation modes:
Wherein Sim is similarity, T 1 is the track to be queried, T 2 is the local track of the mobile device, pi i is the ith dwell point in track T 1, pi j is the jth dwell point in track T 2, τ is the time spent at the dwell point, ρ is the point of interest to which the dwell point corresponds, pi i. τ is the dwell time of the ith dwell point in track T 1, pi j. τ is the dwell time of the jth dwell point in track T 2, pi i.τ∩πj. τ is the common dwell time on the same point of interest;
The central server receives the similarity value sent by the mobile terminal, updates the similarity upper bound value in the priority list to be the similarity value of the corresponding equipment, and according to the received similarity value:
If the received similarity value is greater than the maximum similarity upper limit value, ending the query and returning a preset number of most similar mobile devices;
If the received similarity value is not greater than the similarity upper bound, judging whether the similarity values of all the mobile devices are received, if yes, ending the query and returning to the preset number of most similar devices, if not, returning to the execution center server to receive the similarity values sent by the mobile terminal, and updating the similarity upper bound in the priority list to be the similarity value of the corresponding device.
On the basis of the technical scheme, the central server calculates the similarity upper bound of the track data of the mobile equipment by using the federal index, and adds all similarity upper bound values into the priority queue, and the specific steps comprise:
the central server maintains a similarity upper limit value list;
traversing all stay points in the track to be queried, calculating the square of the stay point based on each stay point, and finding a corresponding interval tree in the federal index;
Finding all leaf nodes with intersections according to the interval information of the stay points, and updating a similarity upper limit value on corresponding equipment based on each leaf node, wherein the similarity upper limit value is calculated in the following way:
UBi=UBi+|π.τ∩Nleaf.τ|
Wherein UB i is the upper limit of the similarity of the ith mobile device, pi is the dwell point, τ is the dwell time, pi.τ is the dwell time of the dwell point, N leaf is the leaf node, N leaf.τ is the dwell time of the leaf node;
after the traversal is completed, all the similarity upper bound values in the similarity upper bound value list are added to the priority list.
On the basis of the above technical solution, after removing the device with the upper limit value of similarity of 0 in the priority queue, the method further includes:
According to the track data characteristics of the mobile equipment, irrelevant parts in query data are removed, so that efficiency of the mobile equipment in calculating the similarity is improved.
On the basis of the technical scheme, after the central server transmits the space grid parameters to all the mobile devices, the method further comprises the following steps:
the central server initializes a root node of the federation index, the root node stores m x n pointers, each pointer points to an interval tree, and the calculation method of m and n is as follows:
Wherein, R is the spatial range of the grid, the spatial range R is represented by longitude and latitude, lat min is the minimum of latitude, lat max is the maximum of latitude, r.lat max is the maximum of latitude of the grid space, r.lat min is the minimum of latitude of the grid space, lon min is the minimum of longitude, lon max is the maximum of longitude, r.lon max is the maximum of longitude of the grid space, r.lon min is the minimum of longitude of the grid space, and δ d is the side length of one square.
On the basis of the technical scheme, the mobile equipment receives grid parameters and anonymizes the space positions in the track data by utilizing grids, and the method specifically comprises the following steps:
The mobile equipment receives the grid parameters and determines grid division;
Traversing all stay points in the local track data and converting the stay points into corresponding footprint information in the square, namely The conversion mode is as follows:
where pi is the dwell point, τ is the dwell time, ρ is the point of interest of dwell point pi, R is the space grid range, δ d is the side length of the grid in the space grid, lon is the longitude, ρ.lon is the longitude of the point of interest ρ, lat is the latitude, ρ.lat is the latitude of the point of interest ρ, x is the offset of the point of interest on the longitude, y is the offset of the point of interest on the latitude, t is the space grid number, c t is the grid number, v is the total number of mobile devices in the data federation.
The invention also provides a track similarity query device based on the data federation, which comprises the following steps:
The execution module is used for numbering the mobile equipment in the data federation and constructing a federation index based on the space grid;
and the query module is used for querying the preset number of mobile devices most similar to the track to be queried by using a dynamic branch reduction algorithm based on the federal index constructed by the execution module.
Compared with the prior art, the method has the advantages that the federal index is built by utilizing the space-time characteristics of the track, the mobile equipment which does not meet the query conditions is filtered through pruning of the upper boundary of the similarity, communication expenditure is reduced, meanwhile, the track data to be queried is pruned based on the space-time characteristics of the track data, the expenditure of local calculation is reduced, and the central server side can terminate the query flow in advance according to the dynamic pruning conditions, so that the query efficiency is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a trace similarity query method based on data federation in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a specific flow of Federal index creation in an embodiment of the invention;
FIG. 3 is a schematic diagram of an index structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a root node of a federal index in accordance with an embodiment of the present invention;
FIG. 5 is a sample diagram of anonymization of trace data locations in an embodiment of the present invention;
fig. 6 is a schematic flow chart of a dynamic pruning algorithm in an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application.
Referring to fig. 1, an embodiment of the present invention provides a track similarity query method based on data federation, including the following steps:
S1, numbering mobile equipment in a data federation;
s2, constructing a federal index through a central server and based on a space grid;
and S3, searching the preset number of mobile devices most similar to the track to be queried by using a dynamic branch reduction algorithm based on the federal index and the number.
The track similarity query method based on the data federation comprises the steps of firstly recording and numbering mobile devices in the data federation, sending registration information to a central server by the mobile devices, numbering all the mobile devices according to the sequence of the received information, then sending space grid parameters to all the mobile devices by the central server, initializing root nodes N root of federation indexes, traversing all stay points in local track data after the mobile devices receive the space grid parameters, converting the stay points into footprint information corresponding to space grids, uploading all the footprint information to the central server by the mobile devices after conversion is completed, receiving the footprint information sent by the mobile devices by the central server, finding a corresponding interval tree in a root node of the federation index according to the space grid information, inserting the interval tree into leaf nodes of the interval tree according to a time interval in the footprint information, recording corresponding device numbers in the leaf nodes, completing construction of the federation index, and then using a dynamic pruning algorithm to find k mobile devices which are most similar to a track Q to be queried, wherein k is preset integer number.
In the invention, the mobile equipment in the data federation is numbered, and the specific steps include:
The mobile device sends registration information to the central server, and the central server numbers all the mobile devices in sequence according to the sequence of the received information.
That is, the mobile device transmits registration information to the central server, and the central server sequentially numbers all the mobile devices according to the sequence of receiving the information, for example, the mobile devices are numbered using S 1-Sv.
In the invention, the method for constructing the federal index based on the space grid through the central server comprises the following specific steps:
The central server sends the space grid parameters to all the mobile devices;
After receiving the grid parameters, the mobile equipment anonymizes the space positions in the track data by using the space grids, and sends the anonymized track information to the central server;
The central server receives anonymized trajectory information, builds a time index over each pane of the spatial grid to record the dwell information of each device within that pane.
That is, the central server transmits space grid parameters to all mobile devices in the data federation, the space grid parameters comprise a space range R of the space grid and a grid side length delta d in the space grid, the space range R= < lat min,latmax,lonmin,lonmax >, the mobile device receives the grid parameters and anonymizes the space position in track data of the mobile device by utilizing grids, wherein the track data T consists of a plurality of stay points pi, namely T= < pi 12,...,πx >, each stay point pi= < ρ, tau > records stay information of a target at an interest point, the stay information comprises the interest point rho and stay time tau, the space-time grid comprises a plurality of grids with the same size, namely G= < c 1,c2,...,cm*n >, and anonymization is to convert the stay points pi= < ρ, tau > into footprint information sigma= < c i, tau > on the corresponding grids, wherein c i is the square grid corresponding to the interest point in the stay points, and transmits the square grid to the server, and the central server receives anonymized track information, and constructs each square grid above the space to record stay information in the device.
In the invention, the space grid parameters comprise the space range of the grid and the side length of the grid inner square.
I.e. the spatial grid parameters comprise the spatial extent r= < lat min,latmax,lonmin,lonmax > of the grid and the side length delta d of the grid inner grid.
In the invention, based on federal index and number, a dynamic branch reduction algorithm is used for searching the mobile equipment with the number most similar to the track to be inquired, and the specific steps include:
the central server initializes a priority queue of a similarity upper bound, the size of which is the total number of mobile devices in the data federation, so as to record the similarity upper bound of the data track and the track to be queried in each mobile device and arrange the data tracks from large to small;
The central server calculates the upper limit of the similarity of the track data of the mobile equipment by using the federal index, and adds all the upper limit values of the similarity into the priority queue;
removing the equipment with the upper limit value of similarity of 0 in the priority queue, and locally calculating the similarity value between the rest mobile equipment and the track to be queried by the following calculation modes:
Wherein Sim is similarity, T 1 is a track to be queried, T 2 is a local track of the mobile device, pi i is an ith dwell point in a T 1 track, pi j is a jth dwell point in a track T 2, τ is a dwell time at the dwell point, ρ is a point of interest to which the dwell point corresponds more, pi i.τ∩πj, τ is a common dwell time at the same point of interest;
The central server receives the similarity value sent by the mobile terminal, updates the similarity upper bound value in the priority list to be the similarity value of the corresponding equipment, and according to the received similarity value:
If the received similarity value is greater than the maximum similarity upper limit value, ending the query and returning a preset number of most similar mobile devices;
If the received similarity value is not greater than the similarity upper bound, judging whether the similarity values of all the mobile devices are received, if yes, ending the query and returning to the preset number of most similar devices, if not, returning to the execution center server to receive the similarity values sent by the mobile terminal, and updating the similarity upper bound in the priority list to be the similarity value of the corresponding device.
The method comprises the steps that a central server initializes a priority queue L with an upper similarity bound, the size of the priority queue L is the total number v of mobile devices in a data federation, the priority queue L records the upper similarity bound of track data in each mobile device and tracks to be queried, the track data and the upper similarity bound of tracks to be queried are sorted according to the numerical value from large to small, the maximum upper similarity bound in the mobile devices is reserved by the head L.top of the queue, the central server calculates the upper similarity bound of the track data of the mobile devices by using the federal index, all the upper similarity bound values are added into the priority queue, for mobile devices with the upper similarity bound of 0, the mobile devices can be judged to be out of the priority queue, the mobile devices with the upper similarity bound of greater than 0 are removed, for the mobile devices with the upper similarity bound of greater than 0, the mobile devices to be queried, the mobile devices locally calculate the similarity values and send the similarity values to the central server, the central server receives the similarity values Sim of the similarity values sent by the mobile devices, the upper similarity values in the priority list are updated, the upper limit values in the priority list are the corresponding similarity values, when the upper similarity values received by the mobile devices are greater than the maximum similarity value Sim, and the maximum similarity value in the priority list is greater than the maximum similarity value is received, and the maximum similarity is equal to the maximum query result is not in the priority list, and the query result is directly is aborted, and query is sent by the mobile device.
The method and the system construct the federal index by utilizing the space-time characteristics of the track, filter out the mobile equipment which does not meet the query condition by pruning the upper bound of the similarity, reduce the communication expense, prune the track data to be queried based on the space-time characteristics of the track data, reduce the expense of local calculation, and terminate the query flow in advance by the central server according to the dynamic pruning condition, thereby further improving the query efficiency.
In the invention, the central server calculates the upper limit of the similarity of the track data of the mobile equipment by using the federal index, and adds all the upper limit values of the similarity into the priority queue, and the specific steps comprise:
the central server maintains a similarity upper limit value list;
traversing all stay points in the track to be queried, calculating the square of the stay point based on each stay point, and finding a corresponding interval tree in the federal index;
Finding all leaf nodes with intersections according to the interval information of the stay points, and updating a similarity upper limit value on corresponding equipment based on each leaf node, wherein the similarity upper limit value is calculated in the following way:
UBi=UBi+|π.τ∩Nleaf.τ|
Wherein UB i is the upper limit of the similarity of the ith mobile device, pi is the dwell point, τ is the dwell time, pi.τ is the dwell time of the dwell point, N leaf is the leaf node, N leaf.τ is the dwell time of the leaf node;
after the traversal is completed, all the similarity upper bound values in the similarity upper bound value list are added to the priority list.
The central server calculates the upper limit of the similarity of the track data of the mobile device by using the federal index, and adds all the upper limit values of the similarity to the priority queue, wherein a similarity upper limit value list UB= { UB 1,UB2,...,UBv } and a query data list QL= { Q 1,Q2,...,Qr } are maintained for the central server, all the stay points in the track Q to be queried are traversed, the square of the stay points pi= < ρ, τ > are calculated, corresponding interval trees in the federal index are found, all leaf nodes intersected with the stay points are found according to the interval information τ of the stay points, and the upper limit UB i=UBi+|π.τ∩Nleaf. τ| of the similarity of the corresponding device is updated for each leaf node N leaf = < i, τ >, and meanwhile the query data Q i=Qi { pi } of the corresponding mobile device is updated. After the traversal is completed, all upper bound values in list UB are added to priority queue L.
In the invention, after removing the device with the upper limit value of similarity of 0 in the priority queue, the method further comprises the following steps:
According to the track data characteristics of the mobile equipment, irrelevant parts in query data are removed, so that efficiency of the mobile equipment in calculating the similarity is improved.
That is, after removing the device with the upper limit value of similarity in the priority queue being 0, the central server will also remove the irrelevant part in the query data according to the track data feature of the mobile device, so as to improve the efficiency of the mobile device in calculating the similarity locally.
In the present invention, after the central server transmits the spatial grid parameters to all the mobile devices, the method further comprises:
the central server initializes a root node of the federation index, the root node stores m x n pointers, each pointer points to an interval tree, wherein the calculation method of m and n is as follows:
That is, after the central server sends the spatial grid parameters to all the mobile devices, the central server initializes a root node N root of the federal index, where the root node stores m×n pointers, each pointer pointing to an interval tree, where the calculation method of m and N is as follows:
Wherein, R is the spatial range of the grid, the spatial range R is represented by longitude and latitude, lat min is the minimum of latitude, lat max is the maximum of latitude, r.lat max is the maximum of latitude of the grid space, r.lat min is the minimum of latitude of the grid space, lon min is the minimum of longitude, lon max is the maximum of longitude, r.lon max is the maximum of longitude of the grid space, r.lon min is the minimum of longitude of the grid space, and δ d is the side length of one square.
In the invention, the mobile equipment receives grid parameters and anonymizes the space position in the track data by utilizing grids, and the method specifically comprises the following steps:
The mobile equipment receives the grid parameters and determines grid division;
Traversing all stay points in the local track data and converting the stay points into corresponding footprint information in the square, namely The conversion mode is as follows:
Namely, the mobile device receives the grid parameters and uses the specific steps of grid anonymization to the spatial position in the track data, namely, the mobile device receives the spatial grid parameters R and delta d, traverses all stay points in the local track data, converts the stay points into footprint information corresponding to the square, and converts pi= < ρ, tau > into sigma= < c i, tau > through the grid parameters R and delta d, namely The specific conversion mode is as follows:
where pi is the dwell point, τ is the dwell time, ρ is the point of interest of dwell point pi, R is the space grid range, δ d is the side length of the grid in the space grid, lon is the longitude, ρ.lon is the longitude of the point of interest ρ, lat is the latitude, ρ.lat is the latitude of the point of interest ρ, x is the offset of the point of interest on the longitude, y is the offset of the point of interest on the latitude, t is the space grid number, c t is the grid number, v is the total number of mobile devices in the data federation.
The embodiment of the invention also provides a track similarity query device based on the data federation, which comprises the following steps:
The execution module is used for numbering the mobile equipment in the data federation and constructing a federation index based on the space grid;
and the query module is used for querying the preset number of mobile devices most similar to the track to be queried by using a dynamic branch reduction algorithm based on the federal index constructed by the execution module.
The embodiment of the invention also provides a track similarity query device based on the data federation, which comprises an execution module and a query module, wherein the execution module is used for numbering mobile equipment in the data federation and constructing a federation index based on a space grid. The query module is used for querying a preset number of mobile devices most similar to the track to be queried by using a dynamic branch reduction algorithm based on the federal index constructed by the execution module.
The track similarity query method for the mobile device data federation provided by the embodiment of the invention is characterized in that each mobile device holds a local track data, a track Q to be queried and an integer k are given, in order to improve query efficiency, the invention designs a space-time federal index for efficiently pruning data to be queried. The flow of the query method of the invention is shown in figure 1 and is mainly divided into three steps:
Initializing federation, wherein all mobile devices send registration information to a central server, and the central server records and numbers the registration information;
constructing a federation index, wherein a central server sends space grid parameters to all mobile devices, the mobile devices carry out position anonymization on local track data according to the grid parameters, the interest points in the stay points are converted into corresponding space grids, the corresponding space grids are uploaded to the central server, the federation index is built, and a specific flow is described in the first embodiment;
And (3) performing query execution, namely performing similarity upper bound pruning by using the federal index by using the central server, filtering out equipment which does not meet query conditions, and then dynamically calculating the similarity upper bound by using a dynamic pruning algorithm and returning a final query result, wherein the detailed flow of the dynamic pruning algorithm is described in the second embodiment.
The invention will be described in more detail below by means of a number of examples.
In the first embodiment, the federation index is created, the specific flowchart is shown in fig. 2, the index structure is shown in fig. 3, and the detailed steps are as follows:
And S201, the central server sends grid parameters to all mobile devices and initializes the root node.
The federal index consists of a root node and a plurality of interval tree indexes recorded by the root node. Fig. 4 is a root node structure diagram of a federal index, which is composed of index header information and child node lists, wherein the index header information includes grid parameter information, namely a grid range R, a grid side length delta d in the grid and a grid total number m×n in the grid, and the child node list includes a plurality of list items, wherein each list item stores a corresponding grid id and a pointer pointing to a corresponding interval tree root node. The internal nodes of the interval tree store interval information, and the leaf nodes store id numbers of corresponding mobile devices in addition to the interval information.
And S202, the mobile equipment receives the grid parameters, performs anonymization processing and uploads the parameters to the central server.
The mobile equipment traverses the local track data, and uploads the corresponding footprint information to the central server after the position anonymization of the stay points in the local track data.
And S203, the central server receives the converted track data and updates the federation index.
And if the node information of the corresponding interval tree does not exist in the root node, a new interval tree is established and the root node is recorded in a root node pointer of the federation index. And finding the inner node of the lowest layer containing the footprint, adding a leaf node into the inner node, wherein the leaf node records the equipment id and the time interval corresponding to the footprint. When the track data processing of all the mobile devices is completed, the federation index is constructed.
In the second embodiment, the anonymization of the track data position is shown in fig. 5, and the specific steps are as follows:
Step one, the mobile equipment receives grid parameters and determines grid division.
After the mobile device receives the grid parameters, grid division in the longitude and latitude directions is calculated respectively. The spatial range in fig. 5 is divided into 4*4 squares, each of the squares is coded according to a scan line type, and the squares corresponding to each interest point in the trace stop points are calculated, and trace data in fig. 5 contains 5 stop points, and total 5 interest points ρ 15 correspond to 5 squares c 2、c6、c7、c12、c16 respectively.
Traversing the track data, and realizing position anonymization by using the space grid.
The basic idea of location anonymization is to replace the exact location information with a spatial extent to hide the true spatial location. In fig. 5, the positions of the interest points in the track stay points are replaced by the square obtained in the step one, the time interval is kept unchanged, the converted track is obtained, and the mobile device sends the converted track to the central server side to construct the federation index.
The third embodiment of the dynamic pruning algorithm comprises the following specific steps:
step one, initializing a similarity priority queue.
For the data federation with the number v of the mobile devices, a central server initializes a priority queue L with the same length, and sequentially puts the track similarity upper bound value of each mobile device into the L according to the size of the priority queue L, and the head of the queue L.top stores the maximum similarity upper bound.
And step two, receiving the similarity value sent by the mobile terminal.
After the mobile device calculates the similarity value locally, the computing result Sim i is sent to the central server, and after the mobile device receives the similarity value, the central server updates the corresponding similarity value in the L by using the similarity value, and dynamically adjusts the maximum similarity upper bound of the first L.top of the team so as to facilitate the subsequent judgment of whether the query termination condition is met.
And step three, judging whether the dynamic pruning condition is met.
When the received similarity value is greater than the maximum similarity upper bound, namely Sim i > L.top, the method indicates that the subsequently received similarity value cannot be higher than the similarity value of the currently received equipment, and step four is entered, otherwise, whether all the similarity values of all the mobile equipment are received is judged, if yes, the query is completed, step four is entered, and otherwise, step two is entered to continue the query operation.
And step four, returning a final query result.
After the inquiry is completed, the central server side sends inquiry end information to all the mobile devices, and returns the last k most similar devices to the user.
The specific flow of the dynamic pruning algorithm in the third embodiment is shown in fig. 6.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (9)

1.一种基于数据联邦的轨迹相似度查询方法,其特征在于,包括以下步骤:1. A trajectory similarity query method based on data federation, characterized by comprising the following steps: 对数据联邦内的移动设备进行编号;Numbering mobile devices within the data federation; 通过中央服务器并基于空间网格构建联邦索引;Build a federated index based on a spatial grid through a central server; 基于联邦索引和编号,使用动态减枝算法查找与待查询轨迹最相似的预设个数的移动设备;Based on the federated index and number, a dynamic branch reduction algorithm is used to find a preset number of mobile devices that are most similar to the query trajectory. 所述基于联邦索引和编号,使用动态减枝算法查找与待查询轨迹最相似的预设个数的移动设备,具体步骤包括:The method uses a dynamic branch reduction algorithm based on the federated index and number to find a preset number of mobile devices that are most similar to the trajectory to be queried. The specific steps include: 中央服务器初始化一个相似度上界的优先权队列,其大小为数据联邦内的移动设备总数,以用于记录每个移动设备中的数据轨迹与待查询轨迹的相似度上界并从大到小排列;The central server initializes a priority queue with a similarity upper bound, whose size is the total number of mobile devices in the data federation. It is used to record the upper bound of the similarity between the data trajectory of each mobile device and the trajectory to be queried and sort them from large to small. 中央服务器利用联邦索引计算移动设备轨迹数据的相似度上界,并将所有相似度上界值添加到优先权队列中;The central server uses the federated index to calculate the upper bound of similarity of the mobile device trajectory data and adds all the upper bound values of similarity to the priority queue; 去除优先权队列中相似度上界值为0的设备,剩余移动设备在本地计算与待查询轨迹的相似度值,计算方式为:Remove the devices with a similarity upper limit of 0 in the priority queue, and the remaining mobile devices calculate the similarity value with the trajectory to be queried locally. The calculation method is: 其中,为相似度,为待查询轨迹,为移动设备本地轨迹,为轨迹中的第个停留点,为轨迹中第个停留点,为在停留点所停留的时间,为停留点多对应的兴趣点,为轨迹中的第个停留点的停留时间,为轨迹中第个停留点的停留时间,为在同一兴趣点上的共同停留时间;in, is the similarity, is the trajectory to be queried, is the local trajectory of the mobile device, For the trajectory The A stopover point, For the trajectory Middle A stopover point, is the time spent at the stop point, There are many points of interest corresponding to the stop points. For the trajectory The The duration of stay at a stop point, For the trajectory Middle The duration of stay at a stop point, The time spent together at the same point of interest; 中央服务器接收移动端发送的相似度值,并更新优先权列表中的相似度上界值为对应设备的相似度值,根据接收到的相似度值:The central server receives the similarity value sent by the mobile terminal and updates the upper similarity limit value in the priority list to the similarity value of the corresponding device. Based on the received similarity value: 若接收到的相似度值大于最大相似度上界值,则结束查询 并返回预设数量个最相似的移动设备;If the received similarity value is greater than the maximum similarity upper limit, the query ends and the preset number of most similar mobile devices are returned; 若接收到的相似度值不大于相似度上界,则判断所有移动设备的相似度值是否都被接收,若是,则结束查询并返回预设数量个最相似的设备,若否,则返回执行中央服务器接收移动端发送的相似度值,并更新优先权列表中的相似度上界值为对应设备的相似度值。If the received similarity value is not greater than the similarity upper bound, it is determined whether the similarity values of all mobile devices have been received. If so, the query is terminated and a preset number of most similar devices are returned. If not, the execution returns to the central server to receive the similarity value sent by the mobile terminal, and the similarity upper bound value in the priority list is updated to the similarity value of the corresponding device. 2.如权利要求1所述的一种基于数据联邦的轨迹相似度查询方法,其特征在于,所述对数据联邦内的移动设备进行编号,具体步骤包括:2. The trajectory similarity query method based on data federation according to claim 1, wherein the numbering of mobile devices in the data federation comprises the following steps: 移动设备向中央服务器发送登记信息,中央服务器根据接收信息的先后顺序依次对全体移动设备进行编号。The mobile device sends registration information to the central server, and the central server numbers all mobile devices in the order in which they are received. 3.如权利要求1所述的一种基于数据联邦的轨迹相似度查询方法,其特征在于,所述通过中央服务器并基于空间网格构建联邦索引,具体步骤包括:3. The trajectory similarity query method based on data federation according to claim 1, wherein the step of constructing a federated index based on a spatial grid through a central server comprises: 中央服务器将空间网格参数发送至全体移动设备;The central server sends the spatial grid parameters to all mobile devices; 移动设备接收网格参数后,将轨迹数据中的空间位置利用空间网格进行匿名化,并将匿名化的轨迹信息发送给中央服务器;After receiving the grid parameters, the mobile device anonymizes the spatial position in the trajectory data using the spatial grid and sends the anonymized trajectory information to the central server; 中央服务器接收匿名化的轨迹信息,在空间网格的每一个方格之上构建时间索引以记录每个设备在该方格内的停留信息。The central server receives anonymized trajectory information and builds a time index on each square of the spatial grid to record the stay information of each device in that square. 4.如权利要求3所述的一种基于数据联邦的轨迹相似度查询方法,其特征在于:4. The trajectory similarity query method based on data federation according to claim 3, characterized in that: 所述空间网格参数包括网格的空间范围和网格内方格的边长。The spatial grid parameters include the spatial extent of the grid and the side lengths of the cells within the grid. 5.如权利要求1所述的一种基于数据联邦的轨迹相似度查询方法,其特征在于,所述中央服务器利用联邦索引计算移动设备轨迹数据的相似度上界,并将所有相似度上界值添加到优先权队列中,具体步骤包括:5. The method for trajectory similarity query based on data federation according to claim 1, wherein the central server calculates the upper similarity bound of mobile device trajectory data using the federated index and adds all similarity upper bound values to a priority queue, specifically comprising the following steps: 中央服务器维护一个相似度上界值列表;The central server maintains a list of similarity upper bound values; 遍历待查询轨迹中所有的停留点,基于每个停留点计算其所在的方格并找到联邦索引中对应的区间树;Traverse all the stop points in the query trajectory, calculate the grid where each stop point is located, and find the corresponding interval tree in the federated index; 根据停留点的区间信息找到与其存在交集的所有叶节点,基于每个叶节点更新对应设备上的相似度上界值,相似度上界值计算方式为:According to the interval information of the stay point, find all the leaf nodes that intersect with it, and update the upper limit of similarity on the corresponding device based on each leaf node. The upper limit of similarity is calculated as follows: 其中,为第个移动设备的相似度上界值,为停留点,为停留时间,为停留点的停留时间,为叶节点,为叶节点的停留时间;in, For the The upper bound of similarity of mobile devices, For the stopover point, is the residence time, is the dwell time at the stop point, is a leaf node, is the residence time of the leaf node; 遍历完成后,将相似度上界值列表中的所有相似度上界值添加到优先权列表中。After the traversal is completed, all similarity upper bound values in the similarity upper bound value list are added to the priority list. 6.如权利要求1所述的一种基于数据联邦的轨迹相似度查询方法,其特征在于,在去除优先权队列中相似度上界值为0的设备之后,还包括:6. The method for trajectory similarity query based on data federation according to claim 1, characterized in that after removing the devices with a similarity upper limit of 0 in the priority queue, the method further comprises: 根据移动设备的轨迹数据特征,去除查询数据中的不相关部分,以提高移动设备计算相似度时的效率。According to the trajectory data characteristics of the mobile device, irrelevant parts of the query data are removed to improve the efficiency of similarity calculation of the mobile device. 7.如权利要求4所述的一种基于数据联邦的轨迹相似度查询方法,其特征在于,在中央服务器将空间网格参数发送至全体移动设备之后,还包括:7. The trajectory similarity query method based on data federation according to claim 4, characterized in that after the central server sends the spatial grid parameters to all mobile devices, it further comprises: 中央服务器初始化联邦索引的根节点,根节点存储个指针,每个指针指向一个区间树,的计算方法为:The central server initializes the root node of the federated index, which stores pointers, each pointer points to an interval tree, and The calculation method is: 其中,为网格的空间范围,空间范围由经纬度表示,为纬度的最小值,为纬度的最大值, 为网格空间的最大纬度,为网格空间的最小纬度,为经度的最小值,为经度的最大值,为网格空间的最大经度,为网格空间的最小经度,为一个方格的边长。in, is the spatial extent of the grid, spatial extent Expressed by longitude and latitude, is the minimum value of latitude, is the maximum value of latitude, is the maximum latitude of the grid space, is the minimum latitude of the grid space, is the minimum value of longitude, is the maximum value of longitude, is the maximum longitude of the grid space, is the minimum longitude of the grid space, is the side length of a square. 8.如权利要求4所述的一种基于数据联邦的轨迹相似度查询方法,其特征在于,所述移动设备接收网格参数并将轨迹数据中的空间位置利用网格匿名化,具体步骤包括:8. The method for trajectory similarity query based on data federation according to claim 4, wherein the mobile device receives grid parameters and anonymizes the spatial positions in the trajectory data using the grid, and the specific steps include: 移动设备接收网格参数,确定网格划分;The mobile device receives the grid parameters and determines the grid division; 遍历本地轨迹数据中的所有停留点,将其转化为在方格中对应的足迹信息,即,转换方式如下:Traverse all the stop points in the local trajectory data and convert them into corresponding footprint information in the grid, that is, , the conversion is as follows: 其中,为停留点,为停留时间,为停留点的兴趣点,为空间网格范围,为空间网格中方格的边长,为经度,为兴趣点的经度,为纬度,为兴趣点的纬度,为兴趣点在经度上的偏移量,为兴趣点在纬度上的偏移量,为空间网格编号,为编号为的方格,为数据联邦内移动设备总数。in, For the stopover point, is the residence time, For stopping point Points of interest, is the spatial grid range, is the side length of a cell in the spatial grid, is the longitude, Points of interest Longitude, is the latitude, Points of interest Latitude, is the offset of the point of interest in longitude, is the latitude offset of the point of interest, is the spatial grid number, For the number The square, is the total number of mobile devices in the data federation. 9.一种采用如权利要求1-8任一项所述的一种基于数据联邦的轨迹相似度查询方法的基于数据联邦的轨迹相似度查询装置,其特征在于,包括:9. A trajectory similarity query device based on data federation using the trajectory similarity query method based on data federation according to any one of claims 1 to 8, characterized by comprising: 执行模块,其用于对数据联邦内的移动设备进行编号,并基于空间网格构建联邦索引;An execution module, configured to number mobile devices within the data federation and construct a federation index based on a spatial grid; 查询模块,其用于基于所述执行模块构建的联邦索引,使用动态减枝算法查询出与待查询轨迹最相似的预设个数的移动设备。The query module is configured to query a preset number of mobile devices that are most similar to the trajectory to be queried using a dynamic branch pruning algorithm based on the federated index constructed by the execution module.
CN202310222854.3A 2023-03-09 2023-03-09 Track similarity query method and device based on data federation Active CN116521803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310222854.3A CN116521803B (en) 2023-03-09 2023-03-09 Track similarity query method and device based on data federation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310222854.3A CN116521803B (en) 2023-03-09 2023-03-09 Track similarity query method and device based on data federation

Publications (2)

Publication Number Publication Date
CN116521803A CN116521803A (en) 2023-08-01
CN116521803B true CN116521803B (en) 2025-08-15

Family

ID=87396588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310222854.3A Active CN116521803B (en) 2023-03-09 2023-03-09 Track similarity query method and device based on data federation

Country Status (1)

Country Link
CN (1) CN116521803B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567497A (en) * 2011-12-23 2012-07-11 浙江大学 Inquiring method of best matching with fuzzy trajectory problems
CN107766406A (en) * 2017-08-29 2018-03-06 厦门理工学院 A kind of track similarity join querying method searched for using time priority

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9593957B2 (en) * 2010-06-04 2017-03-14 Microsoft Technology Licensing, Llc Searching similar trajectories by locations
US11994863B2 (en) * 2019-12-03 2024-05-28 International Business Machines Corporation Trajectory similarity search
CN115374840A (en) * 2022-07-29 2022-11-22 苏州大学 Trajectory similarity calculation method based on projection vector

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567497A (en) * 2011-12-23 2012-07-11 浙江大学 Inquiring method of best matching with fuzzy trajectory problems
CN107766406A (en) * 2017-08-29 2018-03-06 厦门理工学院 A kind of track similarity join querying method searched for using time priority

Also Published As

Publication number Publication date
CN116521803A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN103747523B (en) A kind of customer location forecasting system and method based on wireless network
CN102291435B (en) Mobile information searching and knowledge discovery system based on geographic spatiotemporal data
Yan et al. SeMiTri: a framework for semantic annotation of heterogeneous trajectories
CN104462190B (en) A kind of online position predicting method excavated based on magnanimity space tracking
Xu et al. Taxi-RS: Taxi-hunting recommendation system based on taxi GPS data
CN103731916B (en) A kind of customer location forecasting system and method based on wireless network
CN110019616A (en) A kind of POI trend of the times state acquiring method and its equipment, storage medium, server
CN106649656A (en) Spatial-temporal trajectory big data storage method for database
CN108153867A (en) User trajectory Forecasting Methodology and device based on temporal regularity
CN108460102A (en) Social network data querying method, device, computer equipment and storage medium
CN102880709A (en) Data warehouse management system and data warehouse management method
CN103744861A (en) Lookup method and device for frequency sub-trajectories in trajectory data
CN105488211A (en) Method for determining user group based on feature analysis
CN107145526A (en) Geographical social activity keyword Reverse nearest neighbor inquiry processing method under a kind of road network
CN104615734B (en) A kind of community management service big data processing system and its processing method
CN105512301A (en) User grouping method based on social content
CN106372133A (en) Big data-based user behavior analysis processing method and system
CN114937306A (en) A target tracking method and system based on face clustering
CN119691292A (en) POI recommendation method integrating social relation network and space-time context information
Cai et al. Vector-based trajectory storage and query for intelligent transport system
CN117033541A (en) Space-time knowledge graph indexing method and related equipment
CN116521803B (en) Track similarity query method and device based on data federation
CN114253975B (en) A load-aware road network shortest path distance calculation method and device
Bakkal et al. Modeling and querying trajectories using Neo4j spatial and TimeTree for carpool matching
CN106682168A (en) Construction method of visual cross-region urban data query system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant