CN120929541A

CN120929541A - Traffic knowledge graph construction method, device, equipment and medium

Info

Publication number: CN120929541A
Application number: CN202511452894.2A
Authority: CN
Inventors: 刘祥; 王成龙; 刘晓冰; 朱文霖; 张超; 杨建新; 吕晓飞
Original assignee: Qingdao Guochuang Wisdom Cloud Brain Technology Co ltd; Hisense TransTech Co Ltd
Current assignee: Qingdao Guochuang Wisdom Cloud Brain Technology Co ltd; Hisense TransTech Co Ltd
Priority date: 2025-10-13
Filing date: 2025-10-13
Publication date: 2025-11-11
Anticipated expiration: 2045-10-13
Also published as: CN120929541B

Abstract

The present application relates to the field of intelligent traffic technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing a traffic knowledge graph. The method comprises the steps of carrying out feature extraction of preset dimensions on traffic basic data to obtain multidimensional feature information, carrying out triplet extraction on traffic accident data to obtain triplet information, carrying out text matching on address data to obtain a first matching result, carrying out place name matching on the address data and a plurality of address information in a standard address library to obtain a second matching result, carrying out standard address coding and place name position matching on the first matching result and the second matching result respectively to obtain first place name position similarity corresponding to the first matching result and second place name position similarity corresponding to the second matching result, carrying out similarity calculation on the first place name position similarity and the second place name position similarity to obtain an address matching result, and constructing a traffic knowledge graph according to the multidimensional feature information, the triplet information and the address matching result.

Description

Traffic knowledge graph construction method, device, equipment and medium

Technical Field

The present application relates to the field of intelligent traffic technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing a traffic knowledge graph.

Background

Traffic accidents are complex system failure events that occur often as a result of "man-car-road-environment-management" multi-factor dynamic coupling actions. The traditional single-factor analysis method is difficult to comprehensively reveal the accident cause mechanism, and the knowledge graph constructed between the multidimensional features can provide a data-driven decision basis for road safety management.

In the prior art, the cross-modal data fusion and knowledge graph construction method aiming at the multidimensional characteristics of the traffic accident has the following defects:

On one hand, the traditional method relies on NLP (Natural Language Processing ) named entity extraction and regularized syntactic dependency relation extraction, and has low understanding capability on complex texts, so that entity relation extraction accuracy is low, and on the other hand, the traditional method lacks systematic multi-mode data standardization processing and risk association methods, so that accurate identification and dynamic management of high-risk crowds, road sections and the like are difficult to realize.

Therefore, how to accurately extract the entity relationship and construct the multidimensional knowledge graph between the entities becomes a problem to be processed urgently.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a medium for constructing a traffic knowledge graph, which are used for accurately extracting entity relations and constructing a multidimensional traffic knowledge graph among entities.

In a first aspect, the present application provides a method for constructing a traffic knowledge graph, where the method includes:

acquiring traffic data of a plurality of data types, wherein the data types comprise address data, traffic basic data and traffic accident data;

Carrying out feature extraction of preset dimensions on the traffic basic data to obtain multidimensional feature information of the traffic basic data; performing triplet extraction on the traffic accident data by using the trimmed large model to obtain triplet information;

Performing text matching on the address data to obtain a first matching result, performing place name matching on the address data and a plurality of address information in a standard address library to obtain a second matching result, performing standard address coding and place name position matching on the first matching result and the second matching result respectively to obtain a first place name position similarity corresponding to the first matching result and a second place name position similarity corresponding to the second matching result, and performing similarity calculation on the first place name position similarity and the second place name position similarity to obtain an address matching result;

And constructing a traffic knowledge graph according to the multidimensional characteristic information, the triplet information and the address matching result.

In one possible implementation manner, the performing triplet extraction on the traffic accident data by using the trimmed large model to obtain triplet information includes:

Text analysis and entity identification are carried out on the traffic accident data to obtain a plurality of entities corresponding to the traffic accident data;

Extracting the relation of a plurality of entities corresponding to the traffic accident data to obtain a causal relation among the entities;

and constructing a triplet of the causal relationship among the entities to obtain triplet information, wherein the triplet information comprises an accident result, an accident address and an accident reason.

In one possible embodiment, the method further comprises:

Carrying out semantic analysis on the accident address, and extracting target address elements in the accident address;

matching the target address element with a plurality of address information in a standard address library to obtain confidence degrees corresponding to the address information respectively;

and taking the address information with the confidence coefficient larger than a preset threshold value in the address information as standard address information, and taking the standard address information as an accident address in the triplet information.

In one possible embodiment, the method further comprises:

Acquiring traffic flow data of a road state data type;

Performing data filtering and space-time unification on the traffic flow data to obtain preprocessed traffic flow data;

extracting road state indexes from the preprocessed traffic flow data to obtain the road state indexes;

And constructing a traffic knowledge graph according to the multidimensional feature information, the triplet information, the address matching result and the road state index.

In one possible embodiment, the method further comprises:

Acquiring view data of the abnormal data type of the road;

Extracting the abnormal road characteristics from the view data to obtain abnormal road characteristics;

And constructing a traffic knowledge graph according to the multidimensional feature information, the triplet information, the address matching result and the road abnormal feature.

In a possible implementation manner, the text matching is performed on the address data to obtain a first matching result, which includes:

word segmentation is carried out on the address data to obtain a plurality of words;

determining, for each word, a word frequency of the word in the address data based on a number of occurrences of the word in the address data and a number of words in the address data;

A first matching result is determined based on the word frequency and the inverse document frequency of the plurality of words in the address data.

In one possible implementation manner, the performing place name matching on the address data and the plurality of address information in the standard address library to obtain a second matching result includes:

splitting the address in the address data to determine a special name character string and a full name character string;

respectively carrying out similarity matching on the special name character string and the special name character strings of the plurality of address information in the standard address library, and determining the special name similarity between the special name character string in the address data and the special name character string of each address information;

Respectively carrying out similarity matching on the common name character strings and the common name character strings of the plurality of address information in the standard address library, and determining the common name similarity between the common name character strings in the address data and the common name character strings of each address information;

And determining a second matching result based on the special name similarity and the common name similarity.

In a second aspect, the present application provides a traffic knowledge graph construction apparatus, where the apparatus includes:

The system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring traffic data of a plurality of data types, and the data types comprise address data, traffic basic data and traffic accident data;

the feature extraction module is used for carrying out feature extraction of preset dimensions on the traffic basic data to obtain multidimensional feature information of the traffic basic data;

the triplet extraction module is used for carrying out triplet extraction on the traffic accident data by using the trimmed large model to obtain triplet information;

The address matching module is used for carrying out text matching on the address data to obtain a first matching result, carrying out place name matching on the address data and a plurality of address information in a standard address library to obtain a second matching result, carrying out standard address coding and place name position matching on the first matching result and the second matching result respectively to obtain a first place name position similarity corresponding to the first matching result and a second place name position similarity corresponding to the second matching result, and carrying out similarity calculation on the first place name position similarity and the second place name position similarity to obtain an address matching result;

And the map construction module is used for constructing a traffic knowledge map according to the multidimensional characteristic information, the triplet information and the address matching result.

In a third aspect, the present application further provides an electronic device, where the electronic device includes a processor, and the processor is configured to implement the steps of the traffic knowledge graph construction method according to any one of the above when executing the computer program stored in the memory.

In a fourth aspect, the present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the traffic knowledge graph construction method according to any one of the above.

In a fifth aspect, the present application provides a computer program product comprising a computer program which when executed by a processor implements the method of constructing a traffic knowledge graph as described in the first aspect above.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

According to the embodiment of the application, the feature extraction of preset dimensionality is carried out on traffic basic data to obtain multi-dimensional feature information of the traffic basic data, the triple extraction is carried out on the traffic accident data by using a fine-tuned large model to obtain triple information, the text matching is carried out on the address data to obtain a first matching result, the place name matching is carried out on the address data and a plurality of address information in a standard address library to obtain a second matching result, standard address coding and place name position matching are respectively carried out on the first matching result and the second matching result to obtain first place name position similarity corresponding to the first matching result and second place name position similarity corresponding to the second matching result, similarity calculation is carried out on the first place name position similarity and the second place name position similarity to obtain an address matching result, and traffic knowledge graph is constructed according to the multi-dimensional feature information, the triple information and the address matching result.

The application adopts a large model to extract traffic data, realizes entity extraction of causal relation in complex text time, carries out text matching and place name matching on address data based on a mode of mixed matching of a plurality of algorithms, carries out similarity analysis on a matching result of the text matching and the place name matching and a place name position matching result to obtain an address matching result, and finally fuses the extracted entity and the address matching result to construct a traffic knowledge graph, and the knowledge graph constructed among multidimensional features can comprehensively reveal accident cause mechanisms and can provide data driving decision basis for road safety management.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the description below are only some embodiments of the present application.

FIG. 1 is a block diagram of a construction method of a traffic knowledge graph provided by an embodiment of the present application;

fig. 2 is a schematic diagram of a traffic knowledge graph according to an embodiment of the present application;

Fig. 3 is a flow chart of a method for constructing a traffic knowledge graph according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of a method for obtaining triplet information according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of an accident address normalization according to an embodiment of the present application;

FIG. 6 is a diagram illustrating a mesh structure of address organization according to an embodiment of the present application;

Fig. 7 is a schematic flow chart of a text matching method according to an embodiment of the present application;

FIG. 8 is a flow chart of a method for matching place names according to an embodiment of the present application;

fig. 9 is a flow chart of a method for processing traffic flow data according to an embodiment of the present application;

Fig. 10 is a flow chart of a view data processing method according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a traffic knowledge graph constructing apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "comprises" and "comprising," along with their variants, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The terms "first," "second," and the like herein are used for descriptive purposes only and are not to be construed as either explicit or implicit relative importance or to indicate the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.

The word "exemplary" is used hereinafter to mean "serving as an example, embodiment, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

on one hand, the traditional method relies on NLP named entity extraction and regularized syntax dependency relation extraction, and has low understanding capability on complex texts, so that entity relation extraction accuracy is low, and on the other hand, the method for standardizing and relating multi-mode data and risks lacks systematicness, so that accurate identification and dynamic management of high-risk crowds, road sections and the like are difficult to realize.

In view of the above, the present application provides a method, apparatus, device and medium for constructing a traffic knowledge graph, which are used for accurately extracting entity relationships and constructing a multidimensional traffic knowledge graph between entities.

The method and the device for obtaining the traffic information can be summarized in the application, wherein the method comprises the steps of carrying out feature extraction of preset dimensions on traffic basic data to obtain multi-dimensional feature information of the traffic basic data, carrying out triple extraction on the traffic accident data by using a fine-tuned large model to obtain triple information, carrying out text matching on address data to obtain a first matching result, carrying out place name matching on the address data and a plurality of address information in a standard address library to obtain a second matching result, carrying out standard address coding and place name position matching on the first matching result and the second matching result respectively to obtain first place name position similarity corresponding to the first matching result and second place name position similarity corresponding to the second matching result, carrying out similarity calculation on the first place name position similarity and the second place name position similarity to obtain an address matching result, and constructing a traffic knowledge map according to the multi-dimensional feature information, the triple information and the address matching result.

The construction method of the traffic knowledge graph is applied to electronic equipment, and the electronic equipment can be a PC, a mobile terminal, terminal equipment, a server and the like. The construction method of the traffic knowledge graph can be applied to a distributed software platform represented by a blockchain.

After the main inventive concept of the embodiment of the present application is introduced, a flow of the method for constructing a traffic knowledge graph provided by the present application is described with reference to fig. 1.

As shown in fig. 1, the construction method of the traffic knowledge graph in the application comprises the following steps:

Aiming at the multi-dimensional characteristics of traffic accident 'people-vehicles-road-environment-event', the data types of traffic data mainly comprise address data, traffic basic data, traffic accident data, road state data types and road abnormal data types for realizing cross-mode data fusion and knowledge graph construction.

Traffic data of different data types are processed differently, traffic data of different data types are standardized, multidimensional features, causal relations and entity extraction are performed, and finally atlas construction among entities is performed. The specific flow is shown in figure 1:

Aiming at traffic basic data of drivers, vehicles and the like, firstly, processing and extracting the characteristics of preset dimensions to obtain multidimensional characteristic information of the traffic basic data, and then constructing a basic map according to the multidimensional characteristic information.

Aiming at traffic accident data such as police, the method comprises the steps of firstly using a fine-tuned large model to extract triples to obtain triples information, then normalizing accident addresses in the triples information, and finally constructing an accident map according to the triples information.

The method comprises the steps of firstly standardizing road address data aiming at address data to obtain a plurality of address information in a standard address library, then analyzing intersection relation of the address data, and finally constructing an address relation map according to the intersection relation analysis.

Aiming at the traffic flow data of the road state data type, the method comprises the steps of firstly unifying the space and the time, then extracting the road state index to obtain the road state index, and finally constructing an address-road state map according to the road state index and the unified space and time.

The method comprises the steps of firstly extracting road abnormal characteristics by using a large model according to view data of the road abnormal data type to obtain the road abnormal characteristics, then standardizing addresses of the road abnormal characteristics, and finally constructing an address-road abnormal map according to the road abnormal characteristics and the addresses.

And finally, fusing the basic map, the accident map, the address relation map, the address-road state map and the address-road abnormal map to construct a final traffic knowledge map.

To explain traffic data of different data types and features to be extracted required in the present application, information of specific traffic data is shown in table 1:

TABLE 1

Performing entity construction according to the data types and the extracted features in the table 1, wherein the information of the entity to be constructed is shown in table 2:

TABLE 2

The core association between entities constructed in table 2 is shown in table 3:

TABLE 3 Table 3

And constructing a final traffic knowledge graph according to the flow shown in fig. 1, the entity information in the table 2 and the core association relation among the entities in the table 3.

The specifically constructed traffic knowledge graph is shown in fig. 2. And extracting the association relation of traffic events and the association relation of roads and events from traffic accident data such as police conditions and the like, for example, road congestion caused by illegal parking. And analyzing potential hidden danger events of the road or the intersection according to view data of the road abnormal data type. And analyzing traffic flow characteristics of the intersections and the roads and road characteristics, such as congestion, intersection saturation and the like, according to the traffic flow data of the road state data types. And establishing an association relation between the intersection and the road information and the standard address. And obtaining the environment information associated with the road at the intersection through the relation between the standard address and the area. And establishing association between people and illegal records from traffic basic data such as traffic flow violations, accident information and the like. And constructing the entity of the driver and the vehicle through traffic basic data such as personnel, vehicle basic information, medical treatment, illegal use and the like.

In order to further explain the technical solution provided by the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although embodiments of the present application provide the method operational steps shown in the following embodiments or figures, more or fewer operational steps may be included in the method based on routine or non-inventive labor. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application.

Referring to fig. 3, a flow chart of a method for constructing a traffic knowledge graph according to an embodiment of the present application may be specifically implemented as steps shown in fig. 3:

In step S301, traffic data of a plurality of data types including address data, traffic base data, and traffic accident data is acquired.

In step S302, feature extraction of preset dimensions is performed on the traffic basic data to obtain multidimensional feature information of the traffic basic data, and triple extraction is performed on the traffic accident data by using a fine-tuned large model to obtain triple information.

In the application, the feature extraction of preset dimension is carried out on the traffic basic data, and the multi-dimensional feature information of the obtained traffic basic data is concretely implemented as follows:

Aiming at the risk assessment requirement of a driver, various information such as basic attributes, social security, violations, law violations and multiplexing of mental drugs of personnel are synthesized, a systematic method for standardized processing of multidimensional features and risk association is provided, and after standardized processing of data, the standardized processing of data is stored in a manner of personnel entity attributes. According to the scheme, through carrying out standardization processing, risk marking standardization processing and risk association logic on 11 core features of a driver in preset dimensions, a core basis for driver risk assessment is provided for traffic multi-mode data fusion and knowledge graph construction, and the method is used for accurately identifying and dynamically managing high-risk crowds.

The core characteristics, the standardized processing method and the risk association logic of 11 preset dimensions of the driver are shown in table 4:

TABLE 4 Table 4

Meanwhile, the road entity characteristic can be constructed by carrying out data analysis on the basis of multiple points of road traffic accidents in the historical case data and grading the road according to the number and the severity of the traffic accidents in the road section. The road class division rule is shown in table 5:

TABLE 5

In one possible implementation, the triple extraction is performed on the traffic accident data using the trimmed large model to obtain triple information, and the steps are performed as shown in fig. 4:

in step S401, text analysis and entity recognition are performed on the traffic accident data, so as to obtain a plurality of entities corresponding to the traffic accident data.

In step S402, a plurality of entities corresponding to traffic accident data are extracted from the relationship, so as to obtain a causal relationship between the plurality of entities.

In step S403, a triplet is constructed for the causal relationship between the entities to obtain triplet information, where the triplet information includes an accident result, an accident address, and an accident reason.

In specific implementation, the application needs to extract accident results, accident addresses and accident reasons from text data of traffic accident data such as police accidents and the like.

The traditional mode is that named entity extraction is carried out by NLP mode, entity relation extraction is carried out based on rule method and syntax dependency relation, the accuracy rate is low, and the complex text is not understood, so that event and relation extraction is carried out by constructing an inference data sample and a method based on big model fine tuning, the logic relation and the causal relation of text description are provided for training of big model, accuracy rate of map triplet identification is improved, and main events and causal relation among events in text description of traffic accident data are identified.

By performing targeted fine tuning training on the large model, the accuracy of performing triplet extraction on the text data is improved. The emphasis on performing triplet extraction is on performing causal relationship extraction for road traffic events.

The method comprises the steps of constructing an inference sample, finely adjusting a model, firstly collecting text data of the traffic accident, cleaning the data, removing repeated and many data, screening the data, carrying out balanced sample selection on the basis of classification data of the traffic accident, carrying out inference labeling on the selected data, carrying out the inference labeling process comprising 5 steps of entity element identification, causal relationship analysis, data standardization processing, triplet splicing and triplet extraction, constructing the inference sample, finely adjusting the model based on the constructed inference sample, and finally carrying out triplet extraction based on the finely-adjusted large model to obtain triplet information.

For example, the traffic accident data is "at people's road liberation road crossing, pedestrians often make a red light to cross the road, i worry about this will cause traffic accidents, and propose to strengthen management or add facilities. "

The text analysis is firstly carried out, and the result after the text structure is disassembled is that the 'free road crossing of people' is a clear position description and is used as the background of the occurrence of the event. "pedestrians often make a red light across a road" describes a repetitive behavior (high frequency event) that is the core of the problem. "I worry that this would lead to traffic accidents" expresses causality, "which refers to the former behavior, directly linking the cause (red light running) and the result (traffic accidents). The "suggestion reinforcement management or add-on facility" is a suggestion portion that is not directly used for triplet extraction (as it is not a causal entity, but a follow-up). Text emphasizes that "pedestrian red light running" behavior may lead to "traffic accidents," which is a typical road traffic causality relationship tether. The text is analyzed to obtain a text key element which is a position= "people's road liberation road crossing", a cause event= "pedestrians often break red lights to cross roads", and a result event= "traffic accident" (potential risk).

And (3) scanning the text key elements, wherein the 'people road liberation road crossing' is a definite place and is classified as an entity type 'place'. "red light running across the road" is a dynamic behavior classified as an entity type "event" representing an illegal action. "traffic accidents" are potential consequences, categorized as entity type "events", representing risks. Therefore, entity identification is carried out on the text key elements, and a plurality of entities corresponding to the traffic accident data are obtained, wherein the addresses are people road liberation intersections, the reasons are that red lights are broken through roads, and the traffic accident results.

And extracting the relation of a plurality of entities corresponding to the traffic accident data to obtain the causal relation among the entities, wherein the causal relation is that pedestrians worry about running red light to cause traffic accidents and the traffic accident risk is worried about running red light.

In the original text of 'the pedestrians frequently run the red light and cross the road', the 'frequent' represents the frequency, but the core is that the 'the pedestrians run the red light' is optimized as a concise event, namely 'the pedestrians run the red light', the text is 'worry will cause', the result is marked as 'traffic accident risk' (distinguishing actual occurrence and potential risk), the accident level is 1, and the triplet information is obtained.

The system is subjected to standardized treatment, a risk level label is constructed, traffic accidents are classified into 5 levels according to the severity of the accidents and social influence, (1 level is the slightest, 5 levels are the most serious), and the following is a risk level standardized description:

The level 1 is that the accident potential is slight, or the accident potential is worried, no casualties exist, or only slight scratch exists, the loss amount is lower and is generally below 1000 yuan, and typical scenes such as low-speed rear-end collision, parking scratch and unilateral small accident are worried that the accident occurs, and the like.

Level 2, general accidents, wherein 1-2 people are lightly injured, medical assistance is needed but life is not dangerous, vehicles are partially damaged, the loss amount is about 1000 yuan to 3 ten thousand yuan, and typical scenes such as slight collision of multiple vehicles, light injury of people and mobility are realized;

The level 3 is that major accidents, casualties exist, generally, 1-2 people are seriously injured, the scene that the mobility is lost and hospitalization is needed, or more than 3 people are lightly injured, the vehicles are seriously damaged or the multiple vehicles are damaged, the loss amount caused is about 3 ten thousand yuan to 20 ten thousand yuan of RMB, and typical scenes such as high-speed multiple-vehicle rear-end collision and side turning of a passenger car are realized;

Grade 4, major accidents, causing death of 1-2 persons or serious injury of more than 3 persons, and serious collision of typical scenes such as dangerous chemical leakage, falling cliffs of buses and school buses, wherein the property loss is more than 20 ten thousand yuan and less than 100 ten thousand yuan;

Grade 5, particularly serious accidents, causing death of more than 3 people or serious injury of more than 10 people, and property loss exceeding 100 ten thousand yuan, and needing to start national emergency response, such as tunnel/bridge collapse, cascade explosion and extremely serious traffic accidents.

And finally, performing triplet construction to obtain triplet information, and performing standardized output on the output triplet information, wherein the accident address is a people road liberation road intersection, the accident cause is that a red light is broken through a road, and the accident result is a grade 1 traffic accident risk.

In a possible embodiment, the method provided in the present application may also be performed as steps as shown in fig. 5:

in step S501, the accident address is semantically parsed, and the target address element in the accident address is extracted.

In step S502, the target address element is matched with a plurality of address information in the standard address library, so as to obtain confidence levels corresponding to the plurality of address information.

In step S503, address information with confidence degree larger than a preset threshold value in the plurality of address information is used as standard address information, and the standard address information is used as accident address in the triplet information.

In the specific implementation, semantic analysis is carried out on an accident address by utilizing a large model, target address elements such as road names, landmarks, azimuth words of 'south sides/nearby', and the like are extracted, the ambiguity problem of fuzzy expression such as 'nearby' is solved, a specific range is needed to be judged by combining a context, the analyzed target address elements are subjected to multidimensional matching with a plurality of address information in a standard address library, such as road name matching, landmark association matching and azimuth word calibration, address fragments with high confidence coefficient are preferentially matched, such as definite road names of 'people' are used as standard address information, and the address information with the confidence coefficient larger than a preset threshold value in the plurality of address information is used as the standard address information.

The standard address information is in a unified address format, for example, the 'XX road and XX road intersection' is normalized to be the 'XX road and XX road intersection', redundant expressions such as prefix words of 'in' and the like are eliminated, and the output address is ensured to accord with the address coding standard, for example, GB/T2260-2020 administrative division codes.

The preset threshold is set according to actual needs, and the application is not limited to the setting.

In the application, after standard address information is obtained, the standard address information is added to a standard address library, and the coverage range of a knowledge base is dynamically expanded.

In the application, the standard address library mainly comprises administrative areas, roads, POIs and the like, and comprises the following main elements:

Administrative division elements refer to address elements corresponding to administrative divisions above and below county levels such as province, city, county and village, are arranged in front of addresses, and comprise province, city, county (area), development area, village, street, community and administrative villages, regional address elements refer to address information of natural villages, business circles, roads, house numbers and cell levels, and location address elements comprise loopholes, units, floors, room numbers and address description information, such as XX number of XX city XX area and XX street XX community of XX area, and comprise hierarchical relationships such as a province-city-area-street-house number and geographic coordinates such as WGS84/GCJ-02 and the like and address aliases such as the official name of Jiao Jiaodong International airport of Jiaodong.

The address organization structure network is shown in fig. 6. The address information in the standard address library consists of address elements according to a certain rule. The core attributes of the address information in the standard address library are shown in table 6:

TABLE 6

In the application, geohash algorithm is used for carrying out geographic position coding on address information in a standard address library, and area boundary division is carried out according to area radiuses of 2m, 20m, 60m, 200m, 1km and 5km for carrying out area information aggregation analysis according to different service scenes, and the association of address data in traffic accident data and other mode data is realized in such a way, for example:

The fuzzy description (such as the south side of the XX intersection) in the alert text can be positioned to a 60 m-level region through the Geohash, and a quantization association is established with the phase difference adjustment of data in signal timing (the same Geohash region), so that a mapping chain of semantic description, spatial coding and signal parameters is formed.

If the microscopic influence (10 m radius) of the pedestrian running the red light needs to be analyzed, generating 8-bit Geohash codes 'ws 18g5 vj', corresponding to a 24m multiplied by 24m rectangular area, and covering the 10m range of the south side of the intersection;

If the traffic flow in the periphery 5km is required to be associated, 4-bit coding 'ws 18' is generated, and the whole traffic situation of the street where the intersection is located is covered corresponding to a rectangular area of 10km multiplied by 5 km.

Through multi-precision Geohash coding, full granularity data fusion from 'single-point event' to 'regional influence' can be realized, and a unified space reference is provided for traffic accident cause analysis.

Thus, the attribute information shown in table 7 is obtained, and a standard address library is constructed based on the attribute information shown in table 7.

TABLE 7

The address information extracted from the traffic data is not standardized in most cases, and is not longitude and latitude information in most cases, and the traffic flow data and the position information extracted from the view data are provided with bayonets or road sections, so that an association relationship can be established between the traffic flow data and the view data. Therefore, the application needs to carry out geohash coding on the position information extracted from the traffic flow data and the view data, then correlate the coded address with the address code of the standard address library, and construct a matching relation through a matching algorithm.

Since the coding length of geohash codes is calculated according to the precision, the coding length adopted in the application is 7-bit precision and the range is about 152 meters.

In step S303, text matching is performed on the address data to obtain a first matching result, place name matching is performed on the address data and a plurality of address information in a standard address library to obtain a second matching result, standard address coding and place name position matching are performed on the first matching result and the second matching result respectively to obtain a first place name position similarity corresponding to the first matching result and a second place name position similarity corresponding to the second matching result, and similarity calculation is performed on the first place name position similarity and the second place name position similarity to obtain an address matching result.

In the specific implementation, the application adopts a multi-algorithm mixing mode to carry out address matching, fuses the TF-IDF algorithm and the KMP matching algorithm, and carries out algorithm fusion improvement on the basis of the TF-IDF algorithm and the KMP matching algorithm. And combining the address description information and the longitude and latitude position coding information to comprehensively match the address, and carrying out association fusion. And comprehensively considering the relevance between address data from the place name and the position information respectively, firstly analyzing and calculating the place name similarity based on the place name, then calculating the place name position similarity based on the position information, and finally carrying out matching relevance on the place name data by combining the place name similarity and the position similarity.

In a possible implementation manner, text matching is performed on address data in the present application, so as to obtain a first matching result, which may be further performed as steps shown in fig. 7:

in step S701, the address data is segmented to obtain a plurality of words.

In step S702, for each word, a word frequency of the word in the address data is determined based on the number of times the word appears in the address data and the number of words in the address data, and an inverse document frequency of the word is determined based on the number of documents including the word in the address data.

In step S703, a first matching result is determined based on the word frequency and the inverse document frequency of the plurality of words in the address data.

In specific implementation, the text is segmented, the TF-IDF algorithm is used for calculation, and the calculation formula of the first matching result is as follows:

TF-IDF=TFIDF。

Where TF (TermFrequency, word frequency) is the word frequency of a word in the address data, tf=the number of times the word appears in the document/the total number of words of the document. IDF is the inverse document frequency, idf=log (total number of documents/(number of documents containing the word+1)), and denominator plus 1 is to prevent the case where denominator appears as 0.

In one possible implementation manner, the address data and the address information in the standard address library are subjected to place name matching, so that a second matching result is obtained, and the steps shown in fig. 8 are further executed:

in step S801, addresses in the address data are split, and a special name character string and a full name character string are determined;

In step S802, performing similarity matching on the special name character string and the special name character strings of the plurality of address information in the standard address library, and determining the special name similarity between the special name character string in the address data and the special name character string of each address information;

In step S803, the through name character string is respectively subjected to similarity matching with the through name character strings of the plurality of address information in the standard address library, and the through name similarity between the through name character string in the address data and the through name character string of each address information is determined;

in step S804, a second matching result is determined based on the special name similarity and the common name similarity.

In specific implementation, the address is split into a structure of 'special name+full name', the special name is a text segment used for distinguishing each other in the place name, for example, the Shandong, qingdao, north City and Guangdong in Qingdao City and North City and Hean Anlu No. 12 in Shandong province are special names. The common name is usually located after the special name, for example, the province, city and district in "Qingdao City and North district and Anlu No. 12 in Qingdao, shandong province" is the common name.

Firstly, calculating the similarity of the special names, and calculating the similarity of the special name strings on the character expression by referring to the KMP (Knuth-Morris-Pratt) algorithm thought.

The special name character strings a= { a1, a 2..ai } and b= (B1, b2.. Bj }, where ai (0 < i < m), bj (0 < j < n) respectively represent chinese characters in the special name character string A, B, and m and n are the character lengths of the special name character string a and the special name character string B, respectively:

firstly, comparing the character string lengths of two special character strings, dividing the special character string with shorter character string length into single characters to form a target set, and assuming that the character string length of the special character string B is shorter, dividing the formed character set into { B1, B2, & gt, bn }.

And then fetching the characters bt (0<t is less than or equal to n) from the character set of the special name character string B in a front-to-back sequence, and searching the matched characters from the front direction in the special name character string A. If the matching character ax=bt (0 < x is less than or equal to m), the special character string A is segmented at the ax position, the character after the ax is taken to form a sub character string A2 of the special character string A, the sub character string A2 is used as an input character string of a subsequent matching process, the next character bt+1 is entered for matching, and if the matching character is not found, the next character in the character set of the special character string B is skipped for matching until each character in the character set of the special character string B completes the matching process. And recording that the number of characters matched with the special character string A and the special character string B is k, wherein the calculation formula of the special similarity Sprime (A, B) between the special character string A and the special character string B is that Sprime (A, B) =k/n multiplied by 100 percent.

And secondly, calculating the general name similarity S general (A, B), analyzing the similarity of the name types according to the name general information, and giving corresponding weight when calculating the name similarity. And if the names of the places are not equal, assigning the similarity of the types of the places according to the difference of the classification attributes of the places. If the subclasses of the place name classification are the same, the place name type similarity is 0.8, if the subclasses of the place name classification are different and the middle class is the same, the place name type similarity is 0.6, and if the middle class of the place name classification is different, the place name type similarity is 0.4.

And finally, combining the special name similarity with the common name similarity to obtain a second matching result S (A, B) =S special (A, B) x S general (A, B).

After a first matching result is obtained based on the TF-IDF algorithm and a second matching result is obtained based on the KMP address matching algorithm, standard address coding and place name position matching are respectively carried out on the first matching result and the second matching result, and a first place name position similarity corresponding to the first matching result and a second place name position similarity corresponding to the second matching result are obtained.

In specific implementation, first address coding is carried out on the first matching result and the second matching result by using geohash codes, and then place name and position matching is carried out, so that the similarity of each address code is obtained. The calculation method is as follows:

Wherein the method comprises the steps of The address code of the ith order after the sorting from big to small according to the similarity in the matching result is represented,And (5) representing the similarity of the ith order in the matching result.

The first place name position similarity of the first matching result is obtained through the formula and is T= { (GEO 1, T1), (GEO 2, T2),. The first place name position similarity of the second matching result is K= { (GEO 1, T1), (GEO 2, T2), (GEOf, tf) };

and finally, carrying out similarity calculation on the first place name position similarity and the second place name position similarity according to the following formula:

And after similarity calculation, using the Geohash code with the highest similarity weight and larger than a preset threshold value as an address matching result. The preset threshold value is set according to actual requirements.

In step S304, a traffic knowledge graph is constructed according to the multidimensional feature information, the triplet information and the address matching result.

In a possible embodiment, the method provided in the present application may also be performed as steps as shown in fig. 9:

in step S901, obtaining traffic flow data of a road status data type;

In step S902, data filtering and space-time unification are carried out on the traffic flow data to obtain preprocessed traffic flow data;

In step S903, a traffic knowledge graph is constructed according to the multidimensional feature information, the triplet information, the address matching result, and the road status index.

In the specific implementation, aiming at the traffic flow data of the road state data type, the real-time road entity relationship map is constructed by analyzing the road state of the intersection through calculating the traffic flow, the average speed, the congestion index and the flow average value in real time.

Filtering the data firstly, namely eliminating abnormal records with speed exceeding the speed limit of a road section by 200% in the data of the traffic flow, for example, filtering data of 120km/h when the speed limit of an urban road is 60km/h, correcting track point jump, for example, the distance between adjacent points exceeds 500m and the time difference is less than 1 second, marking the track point jump as equipment fault and eliminating the track point jump.

And performing space-time unified processing, namely converting all data time stamps into UTC+8 time zones, simultaneously converting equipment coordinates (such as GCJ-02 and BD-09) into a WGS84 standard coordinate system, controlling errors within +/-2 m, simultaneously generating a multi-precision geohash code, and realizing spatial association with police and accident data through the geographic position code.

And finally, carrying out traffic flow of the intersection and the trunk line from the real-time traffic flow data, and calculating road state indexes such as real-time traffic flow, average speed, congestion index, and the average value of the traffic flow of nearly one hour.

The calculation method of the road state indexes such as the real-time flow, the average speed, the congestion index, the average value of the flow of the near hour and the like is the prior art, and is not described herein.

The calculation method of the road state index is shown in table 8:

TABLE 8

In a possible embodiment, the method provided in the present application may also be performed as steps as shown in fig. 10:

in step S1001, view data of a road abnormality data type is acquired;

in step S1002, road abnormal feature extraction is performed on the view data to obtain road abnormal features;

in step S1003, a traffic knowledge graph is constructed according to the multidimensional feature information, the triplet information, the address matching result, and the road abnormality feature.

In specific implementation, aiming at view data of abnormal road data types, data marking is carried out through images captured by a video clip and images of a video for inspection of the bayonet, fine adjustment training is carried out based on InternVL multi-mode large model fine adjustment training, abnormal road characteristics are identified, such as the situation that guardrails are absent, central isolation bars are absent, mark marking conflicts, long and steep slopes are absent, traffic marks are shielded, facilities are damaged, marks are set wrong, road openings are too many, speed bump losses are caused, and other scenes are identified, meanwhile, the grades of the events are classified by combining road grades and event characteristics.

The attribute description of the road abnormality feature is shown in table 9:

TABLE 9

In one possible implementation manner, the multidimensional feature information, the triplet information, the address matching result, the road abnormal feature and the road state index are fused to construct a traffic knowledge graph.

Based on the same inventive concept, the embodiment of the application further provides a device for constructing a traffic knowledge graph, and fig. 11 is a schematic structural diagram of the device for constructing a traffic knowledge graph, provided by the embodiment of the application, the device includes:

The data acquisition module 1101 is configured to acquire traffic data of a plurality of data types, where the data types include address data, traffic base data, and traffic accident data;

The feature extraction module 1102 is configured to perform feature extraction of a preset dimension on the traffic base data to obtain multidimensional feature information of the traffic base data;

The triplet extraction module 1103 is configured to perform triplet extraction on the traffic accident data by using the trimmed large model to obtain triplet information;

The address matching module 1104 is used for performing text matching on the address data to obtain a first matching result, performing place name matching on the address data and a plurality of address information in a standard address library to obtain a second matching result, performing standard address coding and place name position matching on the first matching result and the second matching result respectively to obtain a first place name position similarity corresponding to the first matching result and a second place name position similarity corresponding to the second matching result, and performing similarity calculation on the first place name position similarity and the second place name position similarity to obtain an address matching result;

the map construction module 1105 is configured to construct a traffic knowledge map according to the multidimensional feature information, the triplet information and the address matching result.

In one possible implementation, the triplet extraction module 1103 is specifically configured to:

In one possible implementation, the apparatus further includes an index extraction module 1106;

the data acquisition module 1101 is specifically configured to acquire traffic flow data of a road status data type;

The index extraction module 1106 is used for performing data filtering and space-time unification on the traffic flow data to obtain preprocessed traffic flow data;

The map construction module 1105 is configured to construct a traffic knowledge map according to the multidimensional feature information, the triplet information, the address matching result and the road status index.

In one possible implementation, the data obtaining module 1101 is configured to obtain view data of a road abnormal data type;

The feature extraction module 1102 is configured to perform road abnormal feature extraction on the view data to obtain road abnormal features;

The map construction module 1105 is configured to construct a traffic knowledge map according to the multidimensional feature information, the triplet information, the address matching result and the road abnormal feature.

In one possible implementation, the address matching module 1104 is specifically configured to:

Based on the same inventive concept, the embodiment of the application further provides an electronic device, and fig. 12 is a schematic structural diagram of the electronic device provided by the embodiment of the application, as shown in fig. 12, including a processor 1201, a communication interface 1202, a memory 1203 and a communication bus 1204, where the processor 1201, the communication interface 1202 and the memory 1203 complete communication with each other through the communication bus 1204;

The memory 1203 stores a computer program, which when executed by the processor 1201, causes the processor 1201 to execute the steps of any one of the traffic knowledge graph construction methods provided by the embodiments of the present application.

Because the solution of the above-mentioned electronic device is similar to the construction method of the traffic knowledge graph, the implementation of the above-mentioned electronic device may refer to the embodiment of the method, and the repetition is not repeated.

The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface 1202 is used for communication between the above-described electronic device and other devices. The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor including a central Processing unit (cpu), a network processor (Network Processor, NP), etc., or may be a digital instruction processor (DIGITAL SIGNAL Processing, DSP), an application specific integrated circuit (asic), a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc.

On the basis of the above embodiments, the embodiments of the present application further provide a computer readable storage medium, in which a computer program executable by a processor is stored, where the program when executed on the processor causes the processor to execute the steps of any one of the traffic knowledge graph construction methods provided in the embodiments of the present application.

Based on the same inventive concept, the embodiment of the present application provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps of the method for constructing any traffic knowledge graph provided by the embodiment of the present application are implemented.

Since the principle of solving the problem by the computer readable storage medium is similar to that of constructing the traffic knowledge graph, the implementation of the computer readable storage medium can refer to the embodiment of the method, and the repetition is not repeated.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The construction method of the traffic knowledge graph is characterized by comprising the following steps:

2. The method of claim 1, wherein performing triplet extraction on the traffic accident data using the trimmed large model to obtain triplet information comprises:

3. The method according to claim 2, wherein the method further comprises:

4. The method according to claim 1, wherein the method further comprises:

Acquiring traffic flow data of a road state data type;

The data filtering and space-time unification are carried out on the traffic flow data to obtain the preprocessed traffic flow data;

5. The method according to claim 1, wherein the method further comprises:

Acquiring view data of the abnormal data type of the road;

6. The method of claim 1, wherein performing text matching on the address data to obtain a first matching result comprises:

7. The method according to claim 1, wherein performing place name matching on the address data and the plurality of address information in the standard address library to obtain a second matching result includes:

8. A traffic knowledge graph construction device, characterized in that the device comprises:

9. An electronic device, characterized in that it comprises a processor for implementing the steps of the traffic knowledge graph construction method according to any one of claims 1-7 when executing a computer program stored in a memory.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the traffic knowledge graph construction method according to any one of claims 1-7.