[go: up one dir, main page]

CN114595307B - Method, device, storage medium and electronic device for constructing word vector matrix in logistics industry - Google Patents

Method, device, storage medium and electronic device for constructing word vector matrix in logistics industry Download PDF

Info

Publication number
CN114595307B
CN114595307B CN202210134797.9A CN202210134797A CN114595307B CN 114595307 B CN114595307 B CN 114595307B CN 202210134797 A CN202210134797 A CN 202210134797A CN 114595307 B CN114595307 B CN 114595307B
Authority
CN
China
Prior art keywords
enterprise
fence
word vector
context
upstream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210134797.9A
Other languages
Chinese (zh)
Other versions
CN114595307A (en
Inventor
赵岩
蔡抒扬
夏曙东
孙智彬
张志平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinglu Chelian Technology Co ltd
Original Assignee
Beijing Transwiseway Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Transwiseway Information Technology Co Ltd filed Critical Beijing Transwiseway Information Technology Co Ltd
Priority to CN202210134797.9A priority Critical patent/CN114595307B/en
Publication of CN114595307A publication Critical patent/CN114595307A/en
Application granted granted Critical
Publication of CN114595307B publication Critical patent/CN114595307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种物流行业词向量矩阵构建方法、装置、存储介质及电子设备,方法包括:根据车辆停靠点数据和企业围栏数据构建围栏序列和企业上下游关系图;构建围栏序列中每个围栏的词向量;根据企业上下游关系图生成每个企业的多组上下文词汇;从每个围栏的词向量映射出每个企业的多组上下文词汇中每组上下文词汇对应的目标词向量;基于每组上下文词汇对应的目标词向量进行模型训练,生成物流行业词向量矩阵。由于本申请通过构建企业上下游关系,进而构建企业标签中词汇的语义关系,通过自然语言处理的手段,生成物流词汇特有的语义向量表示,从而更好的计算物流词汇间的相似度以及用相应词汇表征的实体的相似度,同时提高了语义表征的准确度。

The present invention discloses a method, device, storage medium and electronic device for constructing a word vector matrix for the logistics industry. The method includes: constructing a fence sequence and an enterprise upstream and downstream relationship diagram based on vehicle stop point data and enterprise fence data; constructing a word vector for each fence in the fence sequence; generating multiple groups of context words for each enterprise based on the enterprise upstream and downstream relationship diagram; mapping the target word vector corresponding to each group of context words in the multiple groups of context words for each enterprise from the word vector of each fence; performing model training based on the target word vector corresponding to each group of context words to generate a logistics industry word vector matrix. Since the present application constructs the upstream and downstream relationship of enterprises, and then constructs the semantic relationship of words in enterprise labels, a semantic vector representation unique to logistics words is generated by means of natural language processing, so as to better calculate the similarity between logistics words and the similarity of entities represented by corresponding words, while improving the accuracy of semantic representation.

Description

Logistics industry word vector matrix construction method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for constructing a word vector matrix in a logistics industry, a storage medium, and an electronic device.
Background
In recent years, in order to apply natural language to fields of semantic analysis, emotion trend analysis, information retrieval and the like, people usually express words (words for short) into one-dimensional or multidimensional word vectors, and then further process the word vectors by using a computing device.
In the logistics industry, the existing cargo vocabulary expression mode is one-hot mode, such as vegetable, fruit and steel, one-hot is used for expressing (1, 0), (0, 1, 0), (0, 0 and 1), and the similarity among the three modes is 0, but in the actual freight scene, the similarity of the vegetable and the fruit is far higher than the similarity of the vegetable, the steel and the fruit and the steel from the viewpoint of the cargo type. Therefore, the vocabulary vector representation mode in the existing logistics industry ignores the semantic information of the vocabulary, and cannot represent the correlation among the vocabularies, so that the accuracy of semantic representation is reduced.
Disclosure of Invention
The embodiment of the application provides a method and a device for constructing a word vector matrix in the logistics industry, a storage medium and electronic equipment. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a method for constructing a word vector matrix in a logistics industry, where the method includes:
constructing a fence sequence and an enterprise upstream and downstream relation diagram according to the vehicle stop point data and the enterprise fence data;
constructing a word vector for each fence in the sequence of fences;
generating multiple groups of context vocabularies of each enterprise according to the enterprise upstream and downstream relation diagram;
mapping a target word vector corresponding to each group of context vocabulary in the plurality of groups of context vocabulary of each enterprise from the word vector of each fence;
And performing model training based on the target word vectors corresponding to each group of context vocabulary, and generating a logistics industry word vector matrix.
Optionally, constructing a fence sequence and an enterprise upstream-downstream relationship graph according to the vehicle stop data and the enterprise fence data includes:
Acquiring vehicle stop point data and enterprise fence data;
associating vehicle stop data with enterprise fence data to convert the vehicle stop data into a fence sequence;
And generating an enterprise upstream and downstream relation graph according to the fence sequence.
Optionally, generating the enterprise upstream and downstream relationship graph according to the fence sequence includes:
Determining the relation between adjacent fences in the fence sequence, and generating a relation graph;
identifying the same relationship and the relationship with the number of the relationships smaller than a preset threshold in the relationship diagram;
and merging the same relations, and eliminating relations with the relation quantity smaller than a preset threshold value to obtain an enterprise upstream and downstream relation diagram.
Optionally, constructing a word vector for each fence in the sequence of fences includes:
Determining interest point type labels and goods type labels corresponding to the enterprise entities according to the vocabulary of the enterprise entities corresponding to each fence in the fence sequence;
Matching initial vectors of words in the interest point type tag and the goods tag from a preset word vector space;
fusing the interest point type tag and the initial vector of each vocabulary in the goods tag to generate a word vector of each fence.
Optionally, generating multiple groups of vocabularies of each enterprise according to the enterprise upstream-downstream relation graph includes:
performing breadth search on each enterprise node in the enterprise upstream and downstream relation graph according to the upstream and downstream directions to obtain a depth tree of each enterprise;
according to each path from a root node to a leaf node in the depth tree of each enterprise, determining a vocabulary context relation corresponding to each path;
And arranging and combining the vocabularies on the context of each vocabulary to generate a plurality of groups of context vocabularies of each enterprise.
Optionally, after model training is performed based on the target word vector corresponding to each group of vocabulary, a word vector matrix of the logistics industry is generated, including:
Inputting target word vectors corresponding to each group of context words into a preset word embedding model, and outputting a plurality of target values;
And generating a logistics industry word vector matrix according to the target values.
Optionally, generating the logistics industry word vector matrix according to the plurality of target values includes:
Summing the multiple target values to generate a model loss value;
when the model loss value reaches a preset threshold value, the output word is embedded into a parameter matrix of the training model middle layer;
And determining the parameter matrix of the middle layer as a logistics industry word vector matrix.
In a second aspect, an embodiment of the present application provides a device for constructing a word vector matrix in a logistics industry, where the device includes:
the data construction module is used for constructing a fence sequence and an enterprise upstream and downstream relation diagram according to the vehicle stop point data and the enterprise fence data;
the word vector construction module is used for constructing a word vector of each fence in the fence sequence;
the vocabulary generating module is used for generating a plurality of groups of context vocabularies of each enterprise according to the enterprise upstream-downstream relation diagram;
The word vector mapping module is used for mapping target word vectors corresponding to each group of context vocabulary in the plurality of groups of context vocabulary of each enterprise from the word vectors of each fence;
The word vector matrix generation module is used for carrying out model training based on the target word vectors corresponding to each group of context vocabulary, and generating a logistics industry word vector matrix.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.
In a fourth aspect, an embodiment of the present application provides an electronic device, which may include a processor and a memory, wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-described method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
According to the embodiment of the application, a logistics industry word vector matrix construction device firstly constructs a fence sequence and an enterprise upstream and downstream relation diagram according to vehicle stop point data and enterprise fence data, then constructs word vectors of each fence in the fence sequence, then generates multiple groups of context words of each enterprise according to the enterprise upstream and downstream relation diagram, then maps out target word vectors corresponding to each group of context words in the multiple groups of context words of each enterprise from the word vectors of each fence, and finally carries out model training based on the target word vectors corresponding to each group of context words to generate a logistics industry word vector matrix. According to the application, by constructing the upstream and downstream relations of the enterprise, further constructing the semantic relations of the vocabularies in the enterprise tag and generating the special semantic vector representation of the logistics vocabularies by means of natural language processing, the similarity between the logistics vocabularies and the similarity of the entity represented by the corresponding vocabularies are calculated better, and the accuracy of semantic representation is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic flow chart of a method for constructing word vector matrix in logistics industry, which is provided by the embodiment of the application;
FIG. 2 is a schematic diagram of an enterprise depth tree according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an enterprise relationship path provided by an embodiment of the present application;
FIG. 4 is a diagram of an enterprise vocabulary context according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a word embedding training model provided by an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a word vector matrix construction device in the logistics industry according to an embodiment of the present application;
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are merely some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention as detailed in the accompanying claims.
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art. Furthermore, in the description of the present invention, unless otherwise indicated, "a plurality" means two or more. "and/or" describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate that there are three cases of a alone, a and B together, and B alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The application provides a method and a device for constructing a word vector matrix in the logistics industry, a storage medium and electronic equipment, and aims to solve the problems in the related technical problems. According to the technical scheme provided by the application, the semantic relation of the vocabularies in the enterprise tag is further constructed by constructing the upstream and downstream relations of the enterprise, and the unique semantic vector representation of the logistics vocabularies is generated by means of natural language processing, so that the similarity between the logistics vocabularies and the similarity of the entities represented by the corresponding vocabularies are better calculated, the accuracy of semantic representation is improved, and the method is described in detail by adopting an exemplary embodiment.
The method for constructing the word vector matrix of the logistics industry provided by the embodiment of the application is described in detail below with reference to fig. 1-5. The method can be realized by a computer program and can be operated on a logistics industry word vector matrix construction device based on a von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application.
Referring to fig. 1, a flow chart of a method for constructing a word vector matrix in a logistics industry is provided in an embodiment of the present application. As shown in fig. 1, the method according to the embodiment of the present application may include the following steps:
S101, constructing a fence sequence and an enterprise upstream and downstream relation diagram according to vehicle stop point data and enterprise fence data;
the vehicle stop point data is vehicle stop position information reported by the vehicle electronic equipment according to a preset period, and at least comprises longitude and latitude of the vehicle stop point. The enterprise fence data is electronic fence data constructed for the enterprise.
In the embodiment of the application, vehicle stop point data and enterprise fence data are firstly obtained, then the vehicle stop point data and the enterprise fence data are associated to convert the vehicle stop point data into a fence sequence, and finally an enterprise upstream and downstream relation diagram is generated according to the fence sequence.
Further, when generating the enterprise upstream and downstream relationship graph according to the fence sequence, firstly determining the relationship between adjacent fences in the fence sequence to generate the relationship graph, then identifying the relationship with the same relationship and the relationship number smaller than the preset threshold value in the relationship graph, finally merging the same relationship, and eliminating the relationship with the relationship number smaller than the preset threshold value to obtain the enterprise upstream and downstream relationship graph.
In one possible implementation, the vehicle stop data and the enterprise fence data are associated to convert the vehicle stop sequence into a fence sequence, then the same relationship is merged according to the relationship between adjacent fences, and the relationship with the number lower than the threshold k is removed to obtain an enterprise upstream and downstream relationship diagram.
S102, constructing word vectors of each fence in the fence sequence;
The application constructs two kinds of fence labels aiming at each fence, wherein the enterprise entity corresponding to each fence in the fence sequence comprises a plurality of labels, one kind is POI type labels such as mining areas, farmlands, ports, construction sites and the like, and the other kind is cargo labels such as seafood, steel products, mineral powder, containers and the like.
In the embodiment of the application, when the word vector of each fence in the fence sequence is constructed, firstly, according to the vocabulary of the enterprise entity corresponding to each fence in the fence sequence, determining the interest point type label and the goods type label corresponding to the enterprise entity, then, matching initial vectors of the vocabularies in the interest point type label and the goods label from a preset word vector space, and finally, fusing the initial vectors of the vocabularies in the interest point type label and the goods label to generate the word vector of each fence.
It should be noted that, the tag is generated by means of keyword vocabulary mapping, and each word is represented by a one-hot vector.
S103, generating multiple groups of context vocabularies of each enterprise according to the enterprise upstream and downstream relation diagram;
In the embodiment of the application, when generating a plurality of groups of vocabularies of each enterprise according to the enterprise upstream and downstream relation diagram, firstly, performing breadth search on each enterprise node in the enterprise upstream and downstream relation diagram according to the upstream and downstream directions to obtain a depth tree of each enterprise, then respectively determining the vocabulary context relation corresponding to each path according to each path from a root node to a leaf node in the depth tree of each enterprise, and finally, arranging and combining the vocabularies on each vocabulary context relation to generate a plurality of groups of context vocabularies of each enterprise.
Further, after model training is performed based on target word vectors corresponding to each group of vocabularies, when a logistics industry word vector matrix is generated, the target word vectors corresponding to each group of contextual vocabularies are input into a preset word embedding model, a plurality of target values are output, and finally the logistics industry word vector matrix is generated according to the plurality of target values.
In one possible implementation, a breadth-based search algorithm is used to search each enterprise node in the enterprise upstream-downstream relationship graph by 2 degrees in the upstream-downstream direction, and each enterprise generates a tree with a depth of 3, as shown in fig. 2. Because the enterprise itself contains type and cargo labels, a set of inter-vocabulary relationships can be established indirectly for each path from the root node to the leaf node in the tree. For example, as shown in fig. 3, taking the path enterprise relationship a= > enterprise b= > enterprise C as an example, a context vocabulary of the path may be generated, for example, as shown in fig. 4, and finally, 2×3×3=18 vocabulary combinations are generated. In particular, in order to solve the problem Of OOV (Out-Of-Vocabullary) caused by the upstream and downstream deletions, two words, namely a start place and a stop place, are reserved, and the word relationship between the upstream and the downstream is not existed, and the word relationship is respectively complemented by the two words.
S104, mapping target word vectors corresponding to each group of context vocabulary in the plurality of groups of context vocabulary of each enterprise from the word vectors of each fence;
in one possible implementation manner, after the word vector of each fence is obtained according to step S102, and the multiple sets of context vocabularies of each enterprise are obtained according to step S103, the target word vector corresponding to each of the multiple sets of context vocabularies of each enterprise may be mapped from the word vector of each fence to perform model training.
S105, model training is carried out based on the target word vectors corresponding to each group of context vocabulary, and a logistics industry word vector matrix is generated.
In the embodiment of the application, firstly, target word vectors corresponding to each group of context words are input into a preset word embedding model, a plurality of target values are output, and finally, a logistics industry word vector matrix is generated according to the plurality of target values.
Specifically, when a logistics industry word vector matrix is generated according to a plurality of target values, the target values are summed to generate a model loss value, then when the model loss value reaches a preset threshold value, an output word is embedded into a parameter matrix of a training model middle layer, and finally the parameter matrix of the middle layer is determined to be the logistics industry word vector matrix.
The preset word embedding model may be CBOW models.
In one possible implementation, for example, as shown in fig. 5, CBOW is used as a word embedding training model, and one-hot vectors corresponding to each set of context vocabulary are input and output as the target value yj. The context window size is 3, wt-1, wt+1 is the context vocabulary, and wt is the vocabulary to be predicted. And finally, after the output yj are summed, judging whether the summed value reaches a preset value, and outputting a parameter matrix of the middle layer (HIDDEN LAYER) when the summed value reaches the preset value.
Further, after a word vector matrix of the logistics industry is obtained, semantic similarity recommendation can be performed according to the matrix, firstly, a fence sequence and an enterprise upstream-downstream relation diagram are constructed according to vehicle stop point data to be matched and enterprise fence data, then word vectors of each fence in the fence sequence are constructed, secondly, multiple groups of context vocabularies of each enterprise are generated according to the enterprise upstream-downstream relation diagram, multiple word vectors to be converted corresponding to each group of context vocabularies in the multiple groups of context vocabularies of each enterprise are mapped from the word vectors of each fence, the multiple word vectors to be converted are converted into multiple target vectors according to the word vector matrix of the logistics industry, the multiple target vectors are combined in pairs, finally, similarity among each group of target vectors is calculated, multiple similarities are obtained, and after the multiple similarities are ordered, articles corresponding to multiple high similarities of a preset percentage are selected for recommendation.
For example, vegetable, fruit, and steel are represented by one-hot, and the target word vectors are (1, 0), (0, 1, 0), (0, 1), and the mutual similarity is 0. The new vector representation of the three words can be converted into (0.9, 0.1), (0.8, 0.1), (0.1 and 0.9) by using the word vector matrix in the logistics industry, the similarity between vegetables and fruits is higher than that between vegetables and steels and between fruits and steels, and finally the recommendation is performed through similarity calculation.
According to the embodiment of the application, a logistics industry word vector matrix construction device firstly constructs a fence sequence and an enterprise upstream and downstream relation diagram according to vehicle stop point data and enterprise fence data, then constructs word vectors of each fence in the fence sequence, then generates multiple groups of context words of each enterprise according to the enterprise upstream and downstream relation diagram, then maps out target word vectors corresponding to each group of context words in the multiple groups of context words of each enterprise from the word vectors of each fence, and finally carries out model training based on the target word vectors corresponding to each group of context words to generate a logistics industry word vector matrix. According to the application, by constructing the upstream and downstream relations of the enterprise, further constructing the semantic relations of the vocabularies in the enterprise tag and generating the special semantic vector representation of the logistics vocabularies by means of natural language processing, the similarity between the logistics vocabularies and the similarity of the entity represented by the corresponding vocabularies are calculated better, and the accuracy of semantic representation is improved.
The following are examples of the apparatus of the present invention that may be used to perform the method embodiments of the present invention. For details not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the method of the present invention.
Referring to fig. 6, a schematic structural diagram of a device for constructing a word vector matrix in a logistics industry according to an exemplary embodiment of the present invention is shown. The logistics industry word vector matrix construction device can be realized into all or part of the electronic equipment through software, hardware or a combination of the software and the hardware. The device 1 comprises a data construction module 10, a word vector construction module 20, a vocabulary generation module 30, a word vector mapping module 40 and a word vector matrix generation module 50.
A data construction module 10, configured to construct a fence sequence and an enterprise upstream and downstream relationship graph according to the vehicle stop data and the enterprise fence data;
a word vector construction module 20 for constructing a word vector for each fence in the sequence of fences;
the vocabulary generating module 30 is configured to generate multiple groups of context vocabularies of each enterprise according to the enterprise upstream-downstream relationship diagram;
A word vector mapping module 40, configured to map, from the word vector of each fence, a target word vector corresponding to each set of context vocabulary in the multiple sets of context vocabulary of each enterprise;
the word vector matrix generating module 50 is configured to perform model training based on the target word vectors corresponding to each set of context vocabulary, and generate a word vector matrix for the logistics industry.
It should be noted that, when the logistic industry word vector matrix construction device provided in the above embodiment executes the logistic industry word vector matrix construction method, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for constructing the word vector matrix in the logistics industry provided in the above embodiment belongs to the same concept as the embodiment of the method for constructing the word vector matrix in the logistics industry, which embodies the detailed implementation process and is not described herein again.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
According to the embodiment of the application, a logistics industry word vector matrix construction device firstly constructs a fence sequence and an enterprise upstream and downstream relation diagram according to vehicle stop point data and enterprise fence data, then constructs word vectors of each fence in the fence sequence, then generates multiple groups of context words of each enterprise according to the enterprise upstream and downstream relation diagram, then maps out target word vectors corresponding to each group of context words in the multiple groups of context words of each enterprise from the word vectors of each fence, and finally carries out model training based on the target word vectors corresponding to each group of context words to generate a logistics industry word vector matrix. According to the application, by constructing the upstream and downstream relations of the enterprise, further constructing the semantic relations of the vocabularies in the enterprise tag and generating the special semantic vector representation of the logistics vocabularies by means of natural language processing, the similarity between the logistics vocabularies and the similarity of the entity represented by the corresponding vocabularies are calculated better, and the accuracy of semantic representation is improved.
The invention also provides a computer readable medium, on which program instructions are stored, which when executed by a processor, implement the method for constructing the word vector matrix of the logistics industry provided by the above method embodiments. The invention also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute the logistic industry word vector matrix construction method of each method embodiment.
Referring to fig. 7, a schematic structural diagram of an electronic device is provided in an embodiment of the present application. As shown in fig. 7, the electronic device 1000 may include at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, and at least one communication bus 1002.
Wherein the communication bus 1002 is used to enable connected communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 1001 may include one or more processing cores. The processor 1001 connects various parts within the overall electronic device 1000 using various interfaces and lines, performs various functions of the electronic device 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and invoking data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1001 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing contents required to be displayed by the display screen, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor 1001 and may be implemented by a single chip.
The Memory 1005 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1005 may include a stored program area that may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc., and a stored data area that may store data, etc., referred to in the above-described respective method embodiments. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 7, an operating system, a network communication module, a user interface module, and a logistic industry word vector matrix construction application program may be included in a memory 1005 as one type of computer storage medium.
In the electronic device 1000 shown in fig. 7, the user interface 1003 is mainly used for providing an input interface for a user to obtain data input by the user, and the processor 1001 may be used for calling a logistic industry word vector matrix construction application program stored in the memory 1005, and specifically performing the following operations:
constructing a fence sequence and an enterprise upstream and downstream relation diagram according to the vehicle stop point data and the enterprise fence data;
constructing a word vector for each fence in the sequence of fences;
generating multiple groups of context vocabularies of each enterprise according to the enterprise upstream and downstream relation diagram;
mapping a target word vector corresponding to each group of context vocabulary in the plurality of groups of context vocabulary of each enterprise from the word vector of each fence;
And performing model training based on the target word vectors corresponding to each group of context vocabulary, and generating a logistics industry word vector matrix.
In one embodiment, the processor 1001, when executing the construction of the fence sequence and the enterprise upstream and downstream relationship graph from the vehicle stop data and the enterprise fence data, specifically performs the following operations:
Acquiring vehicle stop point data and enterprise fence data;
associating vehicle stop data with enterprise fence data to convert the vehicle stop data into a fence sequence;
And generating an enterprise upstream and downstream relation graph according to the fence sequence.
In one embodiment, the processor 1001, when executing the generation of the enterprise upstream and downstream relationship graph from the fence sequence, specifically performs the following operations:
Determining the relation between adjacent fences in the fence sequence, and generating a relation graph;
identifying the same relationship and the relationship with the number of the relationships smaller than a preset threshold in the relationship diagram;
and merging the same relations, and eliminating relations with the relation quantity smaller than a preset threshold value to obtain an enterprise upstream and downstream relation diagram.
In one embodiment, the processor 1001, when executing the word vector that constructs each fence in the sequence of fences, specifically performs the following:
Determining interest point type labels and goods type labels corresponding to the enterprise entities according to the vocabulary of the enterprise entities corresponding to each fence in the fence sequence;
Matching initial vectors of words in the interest point type tag and the goods tag from a preset word vector space;
fusing the interest point type tag and the initial vector of each vocabulary in the goods tag to generate a word vector of each fence.
In one embodiment, the processor 1001, when executing the generation of multiple sets of vocabularies for each enterprise from the enterprise upstream and downstream relationship diagram, specifically performs the following operations:
performing breadth search on each enterprise node in the enterprise upstream and downstream relation graph according to the upstream and downstream directions to obtain a depth tree of each enterprise;
according to each path from a root node to a leaf node in the depth tree of each enterprise, determining a vocabulary context relation corresponding to each path;
And arranging and combining the vocabularies on the context of each vocabulary to generate a plurality of groups of context vocabularies of each enterprise.
In one embodiment, the processor 1001, after performing model training based on the target word vectors corresponding to each set of words, specifically performs the following operations when generating the logistic industry word vector matrix:
Inputting target word vectors corresponding to each group of context words into a preset word embedding model, and outputting a plurality of target values;
And generating a logistics industry word vector matrix according to the target values.
In one embodiment, the processor 1001, when executing the generation of the logistics industry word vector matrix from the plurality of target values, specifically performs the following operations:
Summing the multiple target values to generate a model loss value;
when the model loss value reaches a preset threshold value, the output word is embedded into a parameter matrix of the training model middle layer;
And determining the parameter matrix of the middle layer as a logistics industry word vector matrix.
According to the embodiment of the application, a logistics industry word vector matrix construction device firstly constructs a fence sequence and an enterprise upstream and downstream relation diagram according to vehicle stop point data and enterprise fence data, then constructs word vectors of each fence in the fence sequence, then generates multiple groups of context words of each enterprise according to the enterprise upstream and downstream relation diagram, then maps out target word vectors corresponding to each group of context words in the multiple groups of context words of each enterprise from the word vectors of each fence, and finally carries out model training based on the target word vectors corresponding to each group of context words to generate a logistics industry word vector matrix. According to the application, by constructing the upstream and downstream relations of the enterprise, further constructing the semantic relations of the vocabularies in the enterprise tag and generating the special semantic vector representation of the logistics vocabularies by means of natural language processing, the similarity between the logistics vocabularies and the similarity of the entity represented by the corresponding vocabularies are calculated better, and the accuracy of semantic representation is improved.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs to instruct related hardware, and the program for constructing the word vector matrix of the logistics industry may be stored in a computer readable storage medium, where the program, when executed, may include the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (9)

1. The method for constructing the word vector matrix in the logistics industry is characterized by comprising the following steps:
constructing a fence sequence and an enterprise upstream and downstream relation diagram according to the vehicle stop point data and the enterprise fence data;
Constructing a word vector for each fence in the sequence of fences, wherein,
The constructing a word vector for each fence in the sequence of fences comprises:
Determining interest point type labels and goods type labels corresponding to the enterprise entities according to vocabularies of the enterprise entities corresponding to each fence in the fence sequence;
matching initial vectors of words in the interest point type tag and the goods tag from a preset word vector space;
fusing initial vectors of words in the interest point type tag and the goods tag to generate word vectors of each fence;
generating multiple groups of context vocabularies of each enterprise according to the enterprise upstream and downstream relation diagram, wherein,
Generating multiple groups of context vocabularies of each enterprise according to the enterprise upstream and downstream relation diagram comprises the following steps:
searching each enterprise node in the enterprise upstream and downstream relation diagram according to the upstream and downstream directions to obtain a depth tree of each enterprise;
according to each path in the depth tree of each enterprise, determining the vocabulary context relation corresponding to each path;
Arranging and combining the vocabularies in the vocabulary context relation to generate a plurality of groups of context vocabularies of each enterprise;
mapping target word vectors corresponding to each group of context vocabulary in the multiple groups of context vocabulary of each enterprise from the word vectors of each fence;
and performing model training based on the target word vectors corresponding to each group of context vocabulary, and generating a logistics industry word vector matrix.
2. The method of claim 1, wherein constructing a fence sequence and an enterprise upstream and downstream relationship graph from vehicle stop data and enterprise fence data comprises:
Acquiring vehicle stop point data and enterprise fence data;
Associating the vehicle stop data and the enterprise fence data to convert the vehicle stop data into a fence sequence;
And generating an enterprise upstream and downstream relation graph according to the fence sequence.
3. The method of claim 2, wherein generating an enterprise upstream and downstream relationship graph from the sequence of pens comprises:
determining the relation between adjacent fences in the fence sequence, and generating a relation graph;
identifying the same relationship in the relationship graph and the relationship with the number of the relationships smaller than a preset threshold value;
and merging the same relations, and eliminating relations with the relation quantity smaller than a preset threshold value to obtain an enterprise upstream and downstream relation diagram.
4. The method of claim 1, wherein the search is a breadth search and each path is a root node to leaf node path.
5. The method of claim 1, wherein the performing model training based on the target word vectors corresponding to each set of context vocabulary to generate a logistics industry word vector matrix comprises:
inputting target word vectors corresponding to each group of context words into a preset word embedding model, and outputting a plurality of target values;
and generating a logistics industry word vector matrix according to the target values.
6. The method of claim 5, wherein generating a logistics industry word vector matrix from the plurality of target values comprises:
Summing the target values to generate a model loss value;
outputting a parameter matrix of the word embedded model middle layer when the model loss value reaches a preset threshold value;
And determining the parameter matrix of the middle layer as a logistics industry word vector matrix.
7. A logistic industry word vector matrix construction device implemented using the method of any one of claims 1 to 6, characterized in that the device comprises:
the data construction module is used for constructing a fence sequence and an enterprise upstream and downstream relation diagram according to the vehicle stop point data and the enterprise fence data;
the word vector construction module is used for constructing a word vector of each fence in the fence sequence;
The vocabulary generating module is used for generating a plurality of groups of context vocabularies of each enterprise according to the enterprise upstream and downstream relation diagram;
the word vector mapping module is used for mapping target word vectors corresponding to each group of context vocabulary in the multiple groups of context vocabulary of each enterprise from the word vectors of each fence;
and the word vector matrix generation module is used for carrying out model training based on the target word vectors corresponding to each group of context vocabulary to generate a logistics industry word vector matrix.
8. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method of any of claims 1-6.
9. An electronic device comprising a processor and a memory, wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method according to any of claims 1-6.
CN202210134797.9A 2022-02-14 2022-02-14 Method, device, storage medium and electronic device for constructing word vector matrix in logistics industry Active CN114595307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210134797.9A CN114595307B (en) 2022-02-14 2022-02-14 Method, device, storage medium and electronic device for constructing word vector matrix in logistics industry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210134797.9A CN114595307B (en) 2022-02-14 2022-02-14 Method, device, storage medium and electronic device for constructing word vector matrix in logistics industry

Publications (2)

Publication Number Publication Date
CN114595307A CN114595307A (en) 2022-06-07
CN114595307B true CN114595307B (en) 2025-01-24

Family

ID=81804905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210134797.9A Active CN114595307B (en) 2022-02-14 2022-02-14 Method, device, storage medium and electronic device for constructing word vector matrix in logistics industry

Country Status (1)

Country Link
CN (1) CN114595307B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303857A (en) * 2023-03-15 2023-06-23 广州市城市规划勘测设计研究院 Industrial chain analysis method and equipment based on truck track and industrial park interest point

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362723A (en) * 2019-05-31 2019-10-22 平安国际智慧城市科技股份有限公司 A kind of topic character representation method, apparatus and storage medium
CN111881291A (en) * 2020-06-19 2020-11-03 山东师范大学 A text sentiment classification method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280061B (en) * 2018-01-17 2021-10-26 北京百度网讯科技有限公司 Text processing method and device based on ambiguous entity words
US11023682B2 (en) * 2018-09-30 2021-06-01 International Business Machines Corporation Vector representation based on context
CN111460798B (en) * 2020-03-02 2024-10-18 平安科技(深圳)有限公司 Method, device, electronic equipment and medium for pushing paraphrasing
CN111737996B (en) * 2020-05-29 2024-03-26 北京百度网讯科技有限公司 Methods, devices, equipment and storage media for obtaining word vectors based on language models
CN111539223B (en) * 2020-05-29 2023-08-18 北京百度网讯科技有限公司 Language model training method, device, electronic equipment and readable storage medium
CN113919335B (en) * 2021-09-22 2025-05-09 上海明略人工智能(集团)有限公司 Pre-trained word vector generation method, system, electronic device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362723A (en) * 2019-05-31 2019-10-22 平安国际智慧城市科技股份有限公司 A kind of topic character representation method, apparatus and storage medium
CN111881291A (en) * 2020-06-19 2020-11-03 山东师范大学 A text sentiment classification method and system

Also Published As

Publication number Publication date
CN114595307A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
EP3559868B1 (en) Device placement optimization with reinforcement learning
CN110795913B (en) Text encoding method, device, storage medium and terminal
CN116775183A (en) Task generation method, system, equipment and storage medium based on large language model
CN110134931B (en) Medium title generation method, medium title generation device, electronic equipment and readable medium
CN116662495B (en) Question and answer processing method, method and device for training question and answer processing model
CN115048575B (en) Model generation method, recommendation method, device and electronic device
CN113779225A (en) Entity link model training method, entity link method and device
CN113590776A (en) Text processing method and device based on knowledge graph, electronic equipment and medium
CN110135769B (en) Product attribute filling method and device, storage medium and electronic terminal
JP6019303B1 (en) Problem solving support system
CN110610698A (en) Voice labeling method and device
US12333244B2 (en) Automated address data determinations using artificial intelligence techniques
CN114840680A (en) Entity relationship joint extraction method, device, storage medium and terminal
WO2024259886A9 (en) Image classification method and apparatus, device, storage medium, and program product
CN114595307B (en) Method, device, storage medium and electronic device for constructing word vector matrix in logistics industry
CN113538002B (en) Method and apparatus for reviewing text
US20230385649A1 (en) Linguistic schema mapping via semi-supervised learning
US11468608B2 (en) Machine architecture for computerized plan analysis with provenance
CN120216804A (en) Interaction method, device, intelligent agent and storage medium based on large model
CN117688947B (en) Dialogue processing method, device, electronic device and storage medium based on large model
CN118608038A (en) Cross-border logistics route acquisition method, e-commerce platform, electronic device and storage medium
CN117391774A (en) Keyword-based advertisement generation method, device, equipment and storage medium
KR102453673B1 (en) System for sharing or selling machine learning model and operating method thereof
CN115859121A (en) Text processing model training method and device
CN118643170B (en) Word cloud image generation method, word cloud image generation device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 100176 block a, qianfang building, No.27, Zhongguancun Software Park, No.8, Dongbeiwang West Road, Haidian District, Beijing

Patentee after: Beijing Tranwiseway Information Technology Co.,Ltd.

Country or region after: China

Address before: 100176 block a, qianfang building, No.27, Zhongguancun Software Park, No.8, Dongbeiwang West Road, Haidian District, Beijing

Patentee before: BEIJING TRANWISEWAY INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20250701

Address after: 101116 Beijing Tongzhou District, Beilangying Village, No. 1 Building 2, 3rd Floor, Room 101

Patentee after: Beijing Xinglu Chelian Technology Co.,Ltd.

Country or region after: China

Address before: 100176 block a, qianfang building, No.27, Zhongguancun Software Park, No.8, Dongbeiwang West Road, Haidian District, Beijing

Patentee before: Beijing Tranwiseway Information Technology Co.,Ltd.

Country or region before: China