CN107818076B

CN107818076B - Semantic processing for natural language

Info

Publication number: CN107818076B
Application number: CN201610818984.3A
Authority: CN
Inventors: 秦涛; 刘铁岩
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2016-09-12
Filing date: 2016-09-12
Publication date: 2021-11-12
Anticipated expiration: 2036-09-12
Also published as: CN107818076A

Abstract

In an embodiment of the disclosure, a method and a device for semantic processing of natural language are provided. After obtaining a set of items comprising a plurality of items, one quantized representation of each item in one set of semantic dimensions and another quantized representation in another set of dimensions are determined, and then the quantized representations in the two sets of dimensions are used to generate semantic values for each item. According to embodiments of the present disclosure, semantic values can be used to determine semantic relatedness between different items in a set of items, and each quantitative representation can be shared by multiple items in the set of items. Therefore, by enabling a plurality of items to share the same quantitative representation, the semantic model can be effectively reduced in size and the semantic processing speed can be remarkably improved in the semantic processing process aiming at the natural language.

Description

Semantic processing for natural language

Background

Natural language processing refers to a technique for processing human language using a computer, which enables the computer to understand the human language. The computer is trained through a manually labeled or unlabeled corpus to generate semantic representation of natural language. Natural language processing is an important direction in the field of machine learning, and can be applied to semantic analysis, information retrieval, machine translation, language modeling, chat robots, and the like.

A Recurrent Neural Networks (RNN) is a Neural network with nodes directionally connected into a ring, and its internal state can be represented as dynamic timing data. Unlike a typical non-recurrent neural network, the RNN can use its internal memory to process input sequences at arbitrary timing. The RNN memorizes the previous information and applies it to the calculation of the current output, i.e. the nodes between hidden layers in the neural network are connected, and the input of the hidden layer includes not only the output of the input layer but also the output of the hidden layer at the previous moment. Accordingly, RNNs are suitable for applications such as text generation, machine translation, speech recognition, and image description for predicting words or text at a next time. However, current RNN semantic processing requires a large amount of memory space and processing speed is slow.

Disclosure of Invention

The inventors have noted that a large amount of corpus data can be obtained from a network, which is easily collected. The semantic model trained based on a large amount of corpus data can cover most semantic scenes, so that the method can be effectively applied to actual natural language processing application. Based on this recognition, unlike conventional approaches that use a set of multi-dimensional vectors to represent semantic values for each item (e.g., word), embodiments of the present disclosure use two or more sets of multi-dimensional vectors to represent semantic values for items. With the two or more sets of multi-dimensional vectors, the model size and processing speed of the semantic model can be optimized, which differs significantly from any known scheme in both working principle and mechanism.

For example, according to an embodiment of the present disclosure, an item set including a plurality of items may be obtained. Then, a semantic vector for each item is represented together using more than two sub-vectors, wherein the semantic vector can be used to determine semantic relevance between different items in the set of items, and each sub-vector can be shared by a plurality of items in the set of items, respectively. Therefore, by enabling a plurality of items to share the same sub-vector, the embodiment of the disclosure can not only effectively reduce the size of the semantic model in the semantic processing process for the natural language, but also significantly improve the speed of the semantic processing.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the disclosure, nor is it intended to be used to limit the scope of the disclosure.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:

FIG. 1 illustrates a block diagram of a computing system/server in which one or more embodiments of the present disclosure may be implemented;

FIG. 2 shows a flow diagram of a method for generating semantic values according to an embodiment of the present disclosure;

FIGS. 3A and 3B illustrate example diagrams of tables for a set of items according to embodiments of the present disclosure;

FIG. 4 shows a flow diagram of a method for allocating a plurality of items in a table according to an embodiment of the present disclosure;

FIG. 5 illustrates an example diagram for some rows in a table of a set of items, according to an embodiment of the disclosure;

FIG. 6 shows a flow diagram of a method for determining an associated item in accordance with an embodiment of the present disclosure; and

FIG. 7 illustrates an example diagram of a prediction process based on a semantic model of RNN in accordance with an embodiment of the disclosure.

Throughout the drawings, the same or similar reference numbers refer to the same or similar elements.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

In general, the semantic modeling process of natural language described herein may be considered a machine learning process. Machine learning is a type of algorithm that automatically analyzes and obtains rules from data and predicts unknown data by using the rules. The term "semantic model" as used herein refers to a model built from a priori knowledge associated with the syntax, grammar, morphology, etc. of a particular language, which may be used to determine semantic associations between words. The term "training process" or "learning process" as used herein refers to a process that utilizes a corpus or training data to optimize a semantic model. For example, the semantic model may gradually optimize the semantic value of each item through training or learning, thereby improving the accuracy of the semantic model. In the context of the present disclosure, the terms "training" or "learning" may be used interchangeably for purposes of discussion convenience. As used herein, a "vector" is also referred to as an "embedding vector" and is used to map the semantics of each item into a multidimensional space to form a semantic quantized representation.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment". Relevant definitions for other terms will be given in the following description.

Traditionally, in a machine learning process, a semantic model typically trains items (e.g., words or other strings) in a set of items (e.g., a vocabulary) using a corpus to determine semantic values for each item, which are typically represented by a vector. However, when the number of items in the corpus is too large, the size of the semantic model will also become very large. For example, when training an RNN-based semantic model, when there are multiple items in a set of items, a one-hot vector (one-hot vector) needs to be generated, whose dimension is equal to the number of items in the set of items, each item usually also having a respective embedded vector. For example, when the number of items in the set of items reaches the tens of millions of levels, the semantic model will reach a size of tens of Gigabytes (GB), far exceeding the size of the memory of existing computing devices (e.g., GPUs and mobile computing devices including cell phones and tablets), making the existing computing devices unable to train on the semantic model. Therefore, the traditional semantic modeling method can cause the size of the semantic model to be too large.

Conventional methods generate one high-dimensional vector for each item in the vocabulary to represent its semantic value. For example, Table 1 shows a conventional vector representation for the item, "January "generating vector x₁Generating a vector x for the term "February₂Etc., where the vector x₁And x₂Etc. are semantic representations of the corresponding items in the multidimensional space. It is known that the distance between vectors can represent the degree of semantic relatedness between items. That is, a shorter distance between vectors indicates a higher association between two items, e.g., vector x is because the semantic association of "January" and "one" is greater than the semantic association of "January" and "two ″₁And x₅Is smaller than the vector x₁And x₆The distance between them.

TABLE 1

(Vector)	Item
		x₁	January
x₂	February
		…	…
x₅	one
		x₆	two
…	…

Furthermore, in RNN-based semantic models, the probability of each item in the set of items needs to be calculated when predicting the next item after the current item. However, when the number of items in the set of items reaches the level of tens of millions, the processing speed of the semantic model will also become very slow. Even with the fastest processors currently available, it may take as long as several decades to complete the training process for such a large set of items. Therefore, the conventional semantic modeling method requires a long training time.

Therefore, the disclosure provides a semantic modeling method based on multiple sets of dimensions. According to an embodiment of the present disclosure, semantic values for each item are generated using two or more sets of quantized representations in dimensions, such that the quantized representations in each set of dimensions can be shared by multiple items in a set of items. By way of example, table 2 shows vector representations for items, each item being jointly represented by two vectors, e.g., the item "January" is represented by vector x, according to an embodiment of the disclosure₁And y₁Jointly, the term "February" is represented by the vector x₁And y₂Joint representation in which the vector x₁And x₂Is a quantized representation in a first set of dimensions, and vector y₁And y₂Is a quantized representation in a second set of dimensions.

TABLE 2

(Vector)	Item
		(x₁，y₁)	January
(x₁，y₂)	February
		…	…
(x₂，y₁)	one
		(x₂，y₂)	two
…	…

In this way, by having multiple items share the same quantized representation, for example, the items "January" and "February" share vector x in a first set of dimensions₁In the semantic modeling process aiming at the natural language, the method and the device not only can effectively reduce the size of the semantic model, but also can obviously improve the processing speed of the semantic model.

The basic principles and several example implementations of the present disclosure are explained below with reference to fig. 1-7. FIG. 1 illustrates a block diagram of a computing system/server 100 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing system/server 100 illustrated in FIG. 1 is merely exemplary and should not constitute any limitation as to the functionality or scope of the embodiments described herein.

As shown in FIG. 1, computing system/server 100 is in the form of a general purpose computing device. Components of computing system/server 100 may include, but are not limited to, one or more processors or processing units 110, memory 120, storage 130, one or more communication units 140, one or more input devices 150, and one or more output devices 160. The processing unit 110 may be a real or virtual processor and can perform various processes according to programs stored in the memory 120. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of computing system/server 100.

Computing system/server 100 typically includes a number of computer storage media. Such media may be any available media that is accessible by computing system/server 100 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. Memory 120 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. Storage 130 may be a removable or non-removable medium and may include a machine-readable medium, such as a flash drive, a magnetic disk, or any other medium that may be capable of being used to store information and/or data (e.g., data 170) and that may be accessed within computing system/server 100.

The computing system/server 100 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 1, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. Memory 120 may include one or more program products 122 having one or more sets of program modules configured to perform the functions of the various embodiments described herein.

The communication unit 140 enables communication with another computing device over a communication medium. Additionally, the functionality of the components of computing system/server 100 may be implemented in a single computing cluster or multiple computing machines capable of communicating over a communications connection. Thus, the computing system/server 100 may operate in a networked environment using logical connections to one or more other servers, network Personal Computers (PCs), or another general network node.

The input device 150 may be one or more of a variety of input devices, such as a mouse, keyboard, trackball, or the like. Output device 160 may be one or more output devices such as a display, speakers, printer, or the like. Computing system/server 100 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as desired, through communication unit 140, with one or more devices that enable a user to interact with computing system/server 100, or with any device (e.g., network card, modem, etc.) that enables computing system/server 100 to communicate with one or more other computing devices. Such communication may be performed via input/output (I/O) interfaces (not shown).

As shown in fig. 1, storage device 130 has stored therein data 170, including training data 172 (e.g., a corpus), which computing system/server 100 can use to train to output training results 180 via output device 160. Logically, the training results 180 may be viewed as one or more tables that include a quantized representation (e.g., a vector) of each item in the training data over multiple sets of dimensions. Of course, it should be noted that outputting the training results 180 in the form of a table is merely exemplary, and is not intended to limit the scope of the present disclosure in any way. Any other additional or alternative data format is possible.

Several example embodiments of generating table 180 based on training data 172 are described in detail below. Fig. 2 shows a flow diagram of a method 200 for generating semantic values according to an embodiment of the present disclosure. It should be understood that the method 200 may be performed by the processing unit 110 described with reference to fig. 1.

At 202, a set of items comprising a plurality of items is obtained. For example, a corpus may be obtained as training data from a network of storage devices 130, communication unit 140, input device 150, and the like. The corpus contains linguistic material that has been stored in the language that has actually appeared in actual use, and typically includes a data set of a large number of texts that have been collated. In the corpus, each appearing item is treated as a token. In some embodiments, a deduplication operation may be performed on the vocabulary case, and the deduplicated vocabulary case is treated as a plurality of items in the set of items. In some embodiments, the set of items may include: words such as English "one", "January", Chinese "year", "hour", and so forth; phrases such as "day-and-a-half"; and/or some other string of characters, such as a number, web address, alphanumeric combination, etc.

At 204, a quantized representation (referred to as a "first quantized representation") of each item in the set of items over a set of semantic dimensions (referred to as a "first set of semantic dimensions") and a quantized representation (referred to as a "second quantized representation") over another set of semantic dimensions (referred to as a "second set of semantic dimensions") is determined.

In some embodiments, the quantized representation may be implemented as a vector. For convenience of description, such embodiments will be mainly described below. It will of course be understood that this is merely exemplary and is not intended to limit the scope of the present disclosure in any way. Any other suitable data format is also possible. In addition, for the sake of distinction, a vector corresponding to a quantization expression is referred to as a "sub-vector". For example, the first quantized representation may be implemented as a first sub-vector x, while the second quantized representation may be implemented as a second sub-vector y.

The term "set of semantic dimensions" as used herein refers to a multidimensional space of a certain dimension. Accordingly, a quantized representation of an item refers to a semantic quantized representation of the item's semantic values in a multidimensional space, each of the semantic dimensions representing a sub-semantic value of the item in that dimension. Alternatively, the first set of semantic dimensions and the second set of semantic dimensions may have the same dimensions, such as 1000 dimensions or 1024 dimensions, and the first set of semantic dimensions may be rows in a table and the second set of semantic dimensions may be columns in a table. For example, a first subvector x of 1024 dimensions and a second subvector y of 1024 dimensions for each item in the set of items may be determined. An example embodiment of determining the first sub-vector and the second sub-vector in act 204 will also be described below with reference to fig. 4.

At 206, a semantic value (e.g., a vector) for each item is generated based at least on the first and second quantized representations, e.g., the semantic value for each item may be represented as a vector (x, y). According to embodiments of the present disclosure, the generated vector can be used to determine semantic relevance between the item and other items in the set of items. As described above, for example, the distance between vectors may represent the degree of semantic association between items, and the shorter the distance between vectors, the higher the association between two items, that is, the semantic relationship between items is quantified by the vectors.

For example, "image" and "picture" represent similar semantics, the vector distance between them is relatively short. For another example, because the relationship of "Beijing" and "China" is similar to the relationship of "Washington" and "America," the distance between the vector for "Beijing" and the vector for "China" is substantially equal to the distance between the vector for "Washington" and the vector for "America".

Furthermore, as shown in table 2 above, the first sub-vector x and/or the second sub-vector y are shared by a plurality of items in the set of items, i.e. the plurality of items share the first sub-vector x and the plurality of items share the second sub-vector y.

It should be understood that although the method 200 shows the semantic values of each item being represented by only two sub-vectors, it is also possible to represent the semantic values of each item by more than two sub-vectors. For example, the semantic value of each item may be represented by three sub-vectors (x, y, z), where z represents yet another quantized representation of each item in a third set of semantic dimensions. That is, the scope of the present disclosure is not limited by the number of sub-vectors. Further, while the method 200 determines a vector for each item in the set of items, it may determine only vectors in a subset of the set of items.

In some embodiments, the sub-vectors of the first set of semantic dimensions and the sub-vectors of the second set of semantic dimensions may be represented in a tabular manner, although other data format representations are possible. Referring to FIG. 3A, a table 300 for a set of items is shown, in accordance with an embodiment of the present disclosure. An example embodiment of generating table 300 is described in detail below with reference to fig. 4. In the embodiment described herein, vector x in table 300 is used^rSum vector x^cJointly represent an item, thus vector x^rAnd x^cAlso referred to collectively as subvectors or row and column vectors, respectively.

As shown in FIG. 3A, all items in the set of items are organized into a table 300, with each position in the table 300 corresponding to an item. For example, in table 300, the vector for the word "January" consists of the first row's subvectors

And the subvectors of the first column

The two jointly represent that the vector of the word "February" is composed of the subvectors of the first row

And a second column of subvectors

The two are represented jointly. It can be seen that the subvectors of the first row

At least shared by the words "January" and "February". That is, all the items in row i of table 300 have their subvectors in the first set of semantic dimensions

And all the sub-vectors of the items in the jth column in the second set of semantic dimensions are

Thus, the words in row i and column j in table 300 are represented by subvectors

And

both are connectedAnd (5) combining and representing.

FIG. 3B illustrates a table 350 for a set of items according to an embodiment of the disclosure. As shown in FIG. 3B, in table 350, the word at the ith row and jth column position is represented by the subvector at the ith row

And the sub-vector of the j-th column

Both being represented jointly, i.e. vectors

Wherein

And

are high-dimensional (e.g., 1024-dimensional) vectors. For example, according to embodiments of the present disclosure, a vector of the word "January" may be represented as

While the vector of the word "February" may be expressed as

Conventionally, for a project set having V projects, a vector for representing a semantic value needs to be generated for each project, and V vectors in total need to be generated. However, as described in method 200 and FIGS. 3A-3B, methods according to embodiments of the present disclosure need only be minimal

A sum of row vectors

Individual column vectors, i.e. in total

A vector. In some cases, when the rows and columns in table 300 are not equal, the number of vectors is slightly larger than

But when the value of V is in the order of tens of millions, the number of vectors of the present disclosure is much smaller than that of the conventional method. Thus, the present disclosure can effectively reduce the number of vectors, and thus the size of the semantic model, for example, the present disclosure can reduce the size of the semantic model from the conventional 80 Gigabytes (GB) to 70 Megabytes (MB).

FIG. 4 shows a flow diagram of a method 400 for allocating a plurality of items in a table in accordance with an embodiment of the present disclosure. Method 400 describes accurately assigning items in a table (e.g., table 300) based on training data and determining a vector value for each row and column in the table. It should be understood that method 400 may be a sub-action of action 204 in method 200 described above with reference to fig. 2, and may be performed by processing unit 110 described with reference to fig. 1.

At 402, a plurality of items in a set of items are organized into a table such that items in the table 300 that are in the same row have the same row vector and items in the same column have the same column vector. In some embodiments, items with the same prefix may be initially assigned to the same row in the table, and items with the same suffix may be initially assigned to the same column in the table. In some embodiments, for an English word or string, the prefix represents a previous part of the English word or string and the suffix represents a subsequent part of the English word or string, e.g., the words "exact", "real" and "return" are assigned to the same row and the words "billion", "milliion" and "trilion" are assigned to the same column. Next, at 404, 408, the distribution position of the item in the table 300 is adjusted using the training data set.

Specifically, at 404, the vector is updated based on the assigned position. For example, all row vectors and all column vectors in table 300 are trained using the corpus based on the existing allocation locations until the row is reachedThe vector and/or column vector becomes convergent. For example, the vector values of similar items "image" and "picture" may be updated such that their vector space distance is short. Thus, all or a portion of the row vectors x in table 300^rAnd column vector x^cCan be updated, e.g. a row vector

The value of (b) is updated from (0.025,0.365,0.263, …) to (0.012,0.242,0.347, …).

At 406, the allocation location is adjusted based on the updated vector. In some embodiments, based on the values of all row and column vectors in table 300 that have been determined in 404, the corpus may be used to adjust the allocation locations of all items in table 300 to minimize the loss function for all items. The loss function is a form of optimization objective used to learn or train the model, and may represent a measure of the overall loss used for model optimization, where the loss may represent various factors such as misclassification errors. The assignment position can be adjusted by minimizing the negative log-likelihood function of the next word in the sequence, e.g., in the context of T words, the overall negative log-likelihood function NLL can be represented by equation (1).

Where NLL can be defined as

Wherein NLL_wIs a function representing a negative log-likelihood for a particular word w.

For ease of description, NLL can be referred to_wDenoted as l (w, r (w), c (w)), where (r (w), c (w)) denote the position of the word w in the rows and columns of table 300. Can utilize l_r(w, r (w)) and_c(w, c (w)) represents the row loss of word w and the column loss of word w for l (w, r (w), c (w)). Thus, the negative log-likelihood function NLL of the word w_wAnd can be represented as:

wherein S_wRepresenting the set of all positions of the word w in the corpus.

The position of the word w in the table 300 may then be adjusted to minimize the loss function NLL. For example, assume that word w is moved from an original position (r (w), c (w)) to a new position (i, j) in table 300. The line loss l can be calculated separately assuming that the allocation positions of other words are not changed_r(w, i) and l_c(w, j) column loss, and then l (w, i, j) is determined as l according to equation (2)_r(w,i)+l_c(w, j). The minimization loss function can then be converted to equation (3).

Where a (w, i, j) ═ 1 indicates that word w is assigned to a new position (i, j) in table 300, and where S_rAnd S_cRepresenting the set of rows and columns in the table, respectively.

Thus, the above problem of minimizing the loss function can be equated with the standard minimum weight perfect match problem and can thus be solved by, for example, the minimum cost maximum flow algorithm (MCMF). Any currently known or future developed MCMF method may be used in conjunction with embodiments of the present invention, and the scope of the present invention is not limited in this respect. Therefore, by the MCMF method, the allocation position of each item in the table can be adjusted using the corpus, so that the allocation positions of all items in the item set are more accurate.

At 408, it is determined whether a convergence condition is satisfied. For example, the convergence condition may be that the number of iterations has reached a predetermined number. Alternatively or additionally, the convergence condition may be that the time of the iteration has reached a predetermined time. For another example, the convergence condition may be that the loss function starts to converge, i.e., that the rate of change of the loss function is less than a threshold rate of change. If the convergence condition is not satisfied, then act 404 and 408 are repeated until the convergence condition is satisfied. If the convergence condition has been met, at 410, the assigned positions of the plurality of entries in the table 300 and the values of all the row and column vectors in the table 300 are stored for subsequent use.

In this way, all the row vectors and the column vectors in the table are trained through the corpus, and the allocation positions of all the items in the table are adjusted through the corpus, so that all the items in the item set can be allocated at the most appropriate positions in the table. Therefore, values of all row vectors and column vectors trained according to the embodiment of the disclosure are also very accurate, and the accuracy of the semantic model is effectively ensured.

FIG. 5 illustrates an example diagram of some rows in a table of a set of items according to an embodiment of this disclosure. For example, after several iterative adjustments, items in the same row and/or column in table 300 have semantic associations, or semantic and syntactic associations. For example, in line 832 510, the assigned item is a place name; in line 887 520, the assigned item is a web site; in line 889, the items assigned are expressions about a notion of time. Therefore, the semantic modeling method according to the embodiment of the present disclosure can effectively find semantic relevance between items, or find semantic and grammatical relevance between items, and adjust the allocation position of each item in the table according to the semantics. Therefore, according to the embodiment of the disclosure, not only the accuracy of the semantic model can be ensured, but also the size of the semantic model can be reduced.

FIG. 6 shows a flow diagram of a method 600 for determining associated items, in accordance with an embodiment of the present disclosure. According to the method 600, one item (also referred to as a "first item") associated with another item (also referred to as a "second item") can be determined. It should be understood that method 600 may be performed after act 206 in method 200 described above with reference to fig. 2, and may be performed by processing unit 110 described with reference to fig. 1.

At 602, based on at least the second quantized representation, a third quantized representation associated with the item in the first set of semantic dimensions is determined. For example, assuming that the first term is the word "January," the first quantization is represented as a sub-vector

And the second quantization is represented as a sub-vector

Then, based on the subvectors

And the current hidden state vector in the RNN, the sub-vector of the row most relevant to the first entry (e.g., the row vector of row 2) is determined among all rows in the table 300

). Then, at 604, a fourth quantized representation associated with the item in the second set of semantic dimensions is determined based at least on the third quantized representation. For example, based on the row vector determined in 602

And the current hidden state vector in the RNN, the sub-vector of the column most relevant to the first entry (e.g., the column vector of column 1) is determined among all columns in the table 300

). It should be appreciated that although the above example determines the row vector of another item first and then determines the column vector of another item, it is also possible to determine the column vector of another item first and then determine the row vector of another item.

At 606, another item is determined based on the third quantized representation and the fourth quantized representation. E.g. based on row vectors

Sum column vector

Another entry may be determined to be an entry in the second row and first column of table 300, i.e., the word "one". In some embodiments, the first item may be a current word in the sentence, and a next word after the current word that will appear in the sentence can be predicted based on a semantic vector of the current word.

Conventionally, for a set of items having V items, in determining another item most relevant to a first item, the conventional method requires determining the relevance of each item in the set of items to the first item, and then selecting another item according to all the relevance, thus requiring V × t₀Total time of where t₀Is the time at which the relevance of each item to the first item is determined. However, in a method according to an embodiment of the present disclosure, the V items are organized in

In the table of columns, and first from

Selecting one of the rows and then selecting from

One of the columns is selected, thus only requiring

The total time of (c). In some cases, when the rows and columns in table 300 are not equal, the total time for another item is determined to be slightly greater than

However, at values of V on the order of tens of millions, the processing time according to embodiments of the present disclosure is much less than that of conventional methods, e.g., the processing time can be reduced from conventional decades to several decadesDays or even hours. Therefore, the processing speed of the semantic model can be effectively improved.

Fig. 7 illustrates an example diagram of an RNN-based prediction process in accordance with an embodiment of the disclosure. For example, the RNN-based prediction process described in FIG. 7 may be used to predict words at the current location t in a sentence based on data from the last location t-1. The hidden state vector can be divided into hidden state row vectors h^rAnd hidden state column vector h^c，w_tThe word indicating the t-th position. Let n be the dimension of the input row vector and input column vector in the RNN semantic model, and let m be the dimension of the hidden state vector in the RNN semantic model, where the column vector of position t-1

E.g. set of real numbers RⁿLine vector of position t

E.g. set of real numbers RⁿHidden state row vector at position t-1

E.g. set of real numbers R^m. Thus, the hidden state column vector for position t-1 can be determined by equation (4)

Hidden state row vector of sum position t

Where W, U and b both represent affine transformation parameters where W ∈ real number set R^m*nU is equal to the real number set R^m*mAnd b ∈ real number set R^mAnd f denotes a non-linear activation function (e.g., sigmoid function) in the neural network. In addition, the row and column vectors are both from the input vector matrix X^c，X^rE.g. set of real numbers

Then, the row probability P of the word w at the position t can be determined by equation (5)_r(w_t) And column probability P of word w at position t_c(w_t)。

Where r (w) represents the row index of the word w, c (w) represents the column index of the word w,

e.g. set of real numbers R^mIs Y^rE.g. set of real numbers

The ith vector of (2), an

E.g. set of real numbers R^mIs Y^cE.g. set of real numbers

The ith vector of (2), and S_rAnd S_cRepresenting the set of rows and columns in the table, respectively.

Therefore, the probability that the next word appears at each row and then the probability that the next word appears at each column can be determined by equation (5), thereby determining the word at the position where the row with the highest probability and the column with the highest probability are located as the next word with the highest probability.

It can be seen that the column vector based on position t-1

Hidden state row vector at sum position t-1

The hidden state column vector of the position t-1 can be determined

Then based on

The row probability P of the word w at position t can be determined_r(w_t) I.e. a column vector based at least on a preceding word

A row vector for the current word may be determined. In addition, a row vector passing through position t

Hidden state column vector with sum position t-1

Can determine the hidden layer state row vector of the position t

Then based on

The column probability P of the word w at position t can be determined_c(w_t) I.e. a row vector based at least on the current word

To determine the column vector for the current word. Thus, for a set of items with V items, the probability of a word is determined by separate calculation of the per-row and per-column probabilities, enabling the time to predict the next item to be varied from V x t₀Reduced to

The processing speed of the RNN semantic model is effectively improved.

The methods and functions described herein may be performed, at least in part, by one or more hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Some example implementations of the present disclosure are listed below.

The present disclosure may be implemented as an electronic device comprising a processing unit and a memory coupled to the processing unit and storing instructions that, when executed by the processing unit, perform the following acts: obtaining an item set comprising a plurality of items; determining a first quantized representation of an item in the set of items over a first set of semantic dimensions and a second quantized representation over a second set of semantic dimensions; and generating a semantic value for the item based at least on the first and second quantized representations, wherein the semantic value is usable to determine semantic relevance between the item and other items in the set of items, at least one of the first and second quantized representations being shared by the item with at least one other item in the set of items.

In some embodiments, wherein the semantic values are represented by vectors, the first quantized representation is represented by a first sub-vector, and the second quantized representation is represented by a second sub-vector.

In some embodiments, the actions further include: determining another item in the set of items associated with the item based on the semantic value of the item.

In some embodiments, wherein the item is a first word in a sentence, the another item is a second word that will appear after the first word in the sentence, and determining the another item comprises: predicting the second term based on a semantic value of the first term.

In some embodiments, wherein determining the other item comprises: determining a third quantized representation associated with the item in the first set of semantic dimensions based at least on the second quantized representation; determining a fourth quantized representation associated with the item in the second set of semantic dimensions based at least on the third quantized representation; and determining the further item from the third and fourth quantized representations.

In some embodiments, wherein determining the first and second quantized representations comprises: organizing the plurality of items in the set of items into a table such that items in the table that are in a same row have the same quantized representation in the first set of semantic dimensions and items in a same column have the same quantized representation in the second set of semantic dimensions; and adjusting the allocation positions of the items in the table using a training data set.

In some embodiments, wherein organizing the plurality of items in the set of items into a table comprises: allocating items having the same prefix, which represents a previous portion of the item, to the same row in the table; and assigning items having the same suffix, which represents a latter portion of the item, to the same column in the table.

In some embodiments, wherein utilizing the training data set to adjust the assigned position of the item in the table comprises: iteratively performing the following operations at least once until a convergence condition is satisfied, the convergence condition being related to an iteration time, an iteration number, or a parameter change of the training model: updating the first and second quantized representations based on the assigned position; and adjusting the allocation location based on the updated first and second quantized representations.

The present disclosure may be implemented as a computer-implemented method, the method comprising: obtaining an item set comprising a plurality of items; determining a first quantized representation of an item in the set of items over a first set of semantic dimensions and a second quantized representation over a second set of semantic dimensions; and generating a semantic value for the item based at least on the first and second quantized representations, the semantic value usable to determine a semantic relevance between the item and other items in the set of items, at least one of the first and second quantized representations being shared by the item with at least one other item in the set of items.

In some embodiments, the method further comprises: determining another item in the set of items associated with the item based on the semantic value of the item.

The disclosure may be implemented as a computer program product stored in a non-transitory computer storage medium and comprising machine executable instructions that, when executed in a device, cause the device to: obtaining an item set comprising a plurality of items; determining a first quantized representation of an item in the set of items over a first set of semantic dimensions and a second quantized representation over a second set of semantic dimensions; and generating a semantic value for the item based at least on the first and second quantized representations, the semantic value usable to determine a semantic relevance between the item and other items in the set of items, at least one of the first and second quantized representations being shared by the item with at least one other item in the set of items.

In some embodiments, wherein the machine executable instructions, when executed in a device, further cause the device to: determining another item in the set of items associated with the item based on the semantic value of the item.

Although the disclosure has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An electronic device, comprising:

a processing unit;

a memory coupled to the processing unit and storing instructions that, when executed by the processing unit, perform the following:

obtaining an item set comprising a plurality of items;

determining a first quantized representation of items in the set of items over a first set of semantic dimensions and a second quantized representation over a second set of semantic dimensions, wherein the first set of semantic dimensions represents a multi-dimensional space having a first number of dimensions and the second set of semantic dimensions represents a multi-dimensional space having a second number of dimensions; and

generating a semantic value for the item based at least on the first and second quantized representations, the semantic value usable to determine a semantic relevance between the item and other items in the set of items, at least one of the first and second quantized representations being shared by the item with at least one other item in the set of items.

2. The apparatus of claim 1, wherein the semantic values are represented by vectors, the first quantized representation is represented by a first sub-vector, and the second quantized representation is represented by a second sub-vector.

3. The apparatus of claim 1, the acts further comprising:

determining another item in the set of items associated with the item based on the semantic value of the item.

4. The apparatus of claim 3, wherein the item is a first word in a sentence, the other item is a second word in the sentence that will occur after the first word, and determining the other item comprises:

predicting the second term based on a semantic value of the first term.

5. The apparatus of claim 3, wherein determining the other item comprises:

determining a third quantized representation associated with the item in the first set of semantic dimensions based at least on the second quantized representation;

determining a fourth quantized representation associated with the item in the second set of semantic dimensions based at least on the third quantized representation; and

determining the other item from the third and fourth quantized representations.

6. The apparatus of claim 1, wherein determining the first and second quantized representations comprises:

organizing the plurality of items in the set of items into a table such that items in the table that are in a same row have the same quantized representation in the first set of semantic dimensions and items in a same column have the same quantized representation in the second set of semantic dimensions; and

adjusting the allocation locations of the items in the table using a training data set.

7. The apparatus of claim 6, wherein organizing the plurality of items in the set of items into a table comprises:

allocating items having the same prefix, which represents a previous portion of the item, to the same row in the table; and

items with the same suffix are assigned to the same column in the table, the suffix representing a latter part of the item.

8. The apparatus of claim 6, wherein utilizing a training data set to adjust the assigned position of the item in the table comprises:

iteratively performing the following at least once until a convergence condition is satisfied, the convergence condition being related to at least one of: iteration time, iteration times and parameter changes of the training model:

updating the first and second quantized representations based on the assigned position; and

adjusting the allocation location based on the updated first and second quantized representations.

9. A computer-implemented method, comprising:

obtaining an item set comprising a plurality of items;

10. The method of claim 9, wherein the semantic values are represented by vectors, the first quantized representation is represented by a first sub-vector, and the second quantized representation is represented by a second sub-vector.

11. The method of claim 9, further comprising:

12. The method of claim 11, wherein the item is a first word in a sentence, the other item is a second word in the sentence that will occur after the first word, and determining the other item comprises:

predicting the second term based on a semantic value of the first term.

13. The method of claim 11, wherein determining the other item comprises:

determining the other item from the third and fourth quantized representations.

14. The method of claim 9, wherein determining the first and second quantized representations comprises:

15. The method of claim 14, wherein organizing the plurality of items in the set of items into a table comprises:

16. The method of claim 14, wherein utilizing a training data set to adjust the assigned position of the item in the table comprises:

17. A non-transitory computer readable storage medium storing machine executable instructions that, when executed in a device, cause the device to:

obtaining an item set comprising a plurality of items;

18. The non-transitory computer readable storage medium of claim 17, wherein the machine executable instructions, when executed in a device, further cause the device to:

determining another item in the set of items associated with the item from the third and fourth quantized representations.

19. The non-transitory computer-readable storage medium of claim 17, wherein determining the first and second quantized representations comprises:

20. The non-transitory computer-readable storage medium of claim 19, wherein utilizing a training data set to adjust the assigned position of the item in the table comprises: