CN111428471B - Intention recognition method, device, equipment and storage medium based on artificial intelligence - Google Patents
Intention recognition method, device, equipment and storage medium based on artificial intelligence Download PDFInfo
- Publication number
- CN111428471B CN111428471B CN202010162325.5A CN202010162325A CN111428471B CN 111428471 B CN111428471 B CN 111428471B CN 202010162325 A CN202010162325 A CN 202010162325A CN 111428471 B CN111428471 B CN 111428471B
- Authority
- CN
- China
- Prior art keywords
- question
- preset
- text
- vector
- cosine value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of artificial intelligence, and discloses an intention recognition method, device, equipment and storage medium based on artificial intelligence, which are used for improving the accuracy of forum reply and the drainage efficiency of users. The artificial intelligence-based intention recognition method comprises the steps of collecting initial data from a preset forum website at regular time, preprocessing the initial data according to preset business to obtain a question text clause, calculating an included angle cosine value of the question text clause according to the processed sequence through a word frequency-inverse text frequency index algorithm to obtain a target answer, carrying out intention recognition on the question text clause through a deep neural network text classification model to obtain a template type, combining and intelligently replying the question text clause, the target answer and a preset link according to the template type, and carrying out intelligent reply on spliced contents through a preset crawler task, wherein the preset link is used for indicating a target user to access a target online question-answer system.
Description
Technical Field
The present invention relates to the field of deep learning, and in particular, to an artificial intelligence based intention recognition method, apparatus, device, and storage medium.
Background
Along with the increasing competition of industries, the resource acquisition cost is higher and higher, the popularization and drainage methods are various, the product or service positioning, the user group positioning and the target user or potential user flow scene searching are generally clarified, the user preference or preference is analyzed and mined, and the strategy is formulated to guide the user to reach the target platform so as to realize the network popularization and drainage.
The method for guiding the clips in the forum already exists, but the clips are often low-quality advertising clips, and the clips cannot be accurately returned according to the user preference, so that the accuracy is low, and the user is guided with low efficiency.
Disclosure of Invention
The invention mainly aims to solve the technical problems of low accuracy of forum reply and low drainage efficiency for users.
The first aspect of the invention provides an intention recognition method based on artificial intelligence, which comprises the steps of collecting initial data from a preset forum website at regular time, preprocessing the initial data according to preset business to obtain a question text clause, calculating an included angle cosine value of the question text clause according to the processed sequence through a word frequency-inverse text frequency index algorithm, determining a target answer according to the included angle cosine value, carrying out intention recognition on the question text clause through a deep neural network text classification model to obtain a template type, combining the question text clause, the target answer and a preset link according to the template type, and carrying out intelligent postback on spliced contents through the preset crawler task, wherein the preset link is used for indicating a target user to access a target online question-answering system.
Optionally, in a first implementation manner of the first aspect of the present invention, the timing collection of initial data from a preset forum website includes determining a uniform resource locator address of the preset forum website, accessing the preset forum website at a timing through a preset crawler task and the uniform resource locator address of the preset forum website to obtain web page data, intercepting the web page data according to a preset page identifier to obtain initial data, where the initial data includes an escape character, a space, a web page tag and web page content, and recording the initial data and the uniform resource locator address of the preset forum website into a preset data table.
Optionally, in a second implementation manner of the first aspect of the present invention, the pre-processing the initial data according to the preset service to obtain a question text clause includes determining a keyword of the preset service, extracting the initial data according to the keyword, deleting the escape symbol and the space from the extracted data by a text processing manner, deleting the web page tag to obtain text data, processing the text data, and deleting the clause of the blank string to obtain the question text clause.
Optionally, in a third implementation manner of the first aspect of the present invention, calculating an included angle cosine value of the question text clause according to the processed sequence by a word frequency-inverse text frequency index algorithm, and determining a target answer according to the included angle cosine value, where the method includes preprocessing the question text clause according to the processed sequence to obtain an initial vocabulary and a first question sentence vector, text vectorizing a preset question sentence according to the word frequency-inverse text frequency index algorithm and the initial vocabulary to obtain a second question sentence vector, calculating the first question sentence vector and the second question sentence vector to obtain an included angle cosine value, and setting an answer corresponding to the preset question sentence with the largest included angle cosine value as the target answer.
Optionally, in a fourth implementation manner of the first aspect of the present invention, preprocessing the question text sentence according to the processed sequence to obtain an initial vocabulary and a first question sentence vector, including writing the plurality of question text sentences to a tail of a preset message queue according to the processed sequence, setting a timeout duration for the question text sentence, where the timeout duration is greater than or equal to 0, adding the question text sentence to a delay task queue when the timeout duration corresponding to the question text sentence is equal to 0 and the question text sentence is still in a waiting state, where the delay task queue is used to process the question text sentence with an out-of-term, and performing word segmentation and part-of-speech labeling for the question text sentence when the timeout duration corresponding to the question text sentence is greater than 0 to obtain the initial vocabulary and the first question sentence vector.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the text vectorizing the preset question according to the word frequency-inverse text frequency index algorithm and the initial vocabulary to obtain a second question vector includes obtaining an initial vocabulary, where the initial vocabulary includes a plurality of vocabularies,Reading the number of preset questions from the preset databaseCounting the plurality of words according to word frequency-inverse text frequency index algorithmNumber of occurrences in the question text clauseCounting that the preset question includes the plurality of wordsNumber of questions of (1)Text vectorization is carried out on the preset question according to a first preset formula to obtain a second question vector, wherein the first preset formula,。
Optionally, in a sixth implementation manner of the first aspect of the present invention, the calculating the first question vector and the second question vector to obtain an included angle cosine value, and setting an answer corresponding to a preset question with a maximum included angle cosine value as a target answer includes calculating the first question vector and the second question vector according to a second preset formula to obtain an included angle cosine value, where the second preset formula is thatWherein the saidFor the first question vector, theFor the second question vector,For indicating to determine the first question vector according to the cosine value of the included angleAnd the second question vectorAnd sequencing the answers corresponding to the preset questions according to the sequence from the high value to the low value of the included angle cosine, and setting the answer corresponding to the preset question with the maximum value of the included angle cosine as a target answer.
The invention provides an artificial intelligence-based intention recognition device which comprises an acquisition unit, a preprocessing unit, a calculation unit, an intention recognition unit and a reply unit, wherein the acquisition unit is used for acquiring initial data from a preset forum website at fixed time, the preprocessing unit is used for preprocessing the initial data according to preset services to obtain a question text clause, the calculation unit is used for calculating an included angle cosine value of the question text clause according to a processed sequence through a word frequency-inverse text frequency index algorithm and determining a target answer according to the included angle cosine value, the intention recognition unit is used for carrying out intention recognition on the question text clause through a deep neural network text classification model to obtain a template type, the reply unit is used for combining the question text clause, the target answer and a preset link according to the template type and carrying out intelligent reply on spliced contents through a preset crawler task, and the preset link is used for indicating a target user to access a target online question-answering system.
Optionally, in a first implementation manner of the second aspect of the present invention, the collecting unit is specifically configured to determine a uniform resource locator address of a preset forum website, access the preset forum website at regular time through a preset crawler task and the uniform resource locator address of the preset forum website to obtain web page data, intercept the web page data according to a preset page identifier to obtain initial data, where the initial data includes an escape character, a space, a web page tag and web page content, and record the initial data and the uniform resource locator address of the preset forum website in a preset data table.
Optionally, in a second implementation manner of the second aspect of the present invention, the preprocessing unit is specifically configured to determine a keyword of a preset service, extract the initial data according to the keyword, delete the escape symbol and the space from the extracted data by a text processing manner, delete the web page tag to obtain text data, process the text data in clauses, and delete clauses of an empty character string to obtain a problematic text clause.
Optionally, in a third implementation manner of the second aspect of the present invention, the computing unit further includes a preprocessing subunit, configured to preprocess the question text clause according to a processed sequence to obtain an initial vocabulary and a first question vector, a text processing subunit, configured to perform text vectorization on a preset question according to a word frequency-inverse text frequency index algorithm and the initial vocabulary to obtain a second question vector, and a computing subunit, configured to calculate the first question vector and the second question vector to obtain an included angle cosine value, and set an answer corresponding to the preset question with the largest included angle cosine value as a target answer.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the preprocessing subunit is specifically configured to write the multiple question text sentences into a tail portion of a preset message queue according to a processed sequence, set a timeout period for the question text sentences, where the timeout period is greater than or equal to 0, add the question text sentences to a delay task queue when the timeout period corresponding to the question text sentences is equal to 0 and the question text sentences are still in a waiting state, where the delay task queue is used to process the question text sentences exceeding a period, and perform word segmentation and part-of-speech labeling for the question text sentences when the timeout period corresponding to the question text sentences is greater than 0, so as to obtain an initial vocabulary and a first question sentence vector.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the text processing subunit is specifically configured to obtain an initial vocabulary, where the initial vocabulary includes a plurality of vocabularies,Reading the number of preset questions from the preset databaseCounting the plurality of words according to word frequency-inverse text frequency index algorithmNumber of occurrences in the question text clauseCounting that the preset question includes the plurality of wordsNumber of questions of (1)Text vectorization is carried out on the preset question according to a first preset formula to obtain a second question vector, wherein the first preset formula,。
Optionally, in a sixth implementation manner of the second aspect of the present invention, the computing subunit is specifically configured to set the first question vector toCalculating the first question vector and the second question vector according to a second preset formula to obtain an included angle cosine value, wherein the second preset formula isWherein the saidFor the first question vector, theFor the second question vector,Is used for indicating to determine the first question vector according to the included angle cosine valueAnd the second question vectorAnd sequencing the answers corresponding to the preset questions according to the sequence from the high value to the low value of the included angle cosine, and setting the answer corresponding to the preset question with the maximum value of the included angle cosine as a target answer.
A third aspect of the present invention provides an artificial intelligence based intent recognition device including a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a wire, the at least one processor invoking the instructions in the memory to cause the artificial intelligence based intent recognition device to perform the artificial intelligence based intent recognition method as described in the first aspect above.
A fourth aspect of the invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the artificial intelligence based intent recognition method of the first aspect described above.
According to the technical scheme, initial data are collected from a preset forum website at regular time, data preprocessing is conducted on the initial data according to preset services to obtain question text clauses, an included angle cosine value is calculated on the question text clauses according to the processed sequence through a word frequency-inverse text frequency index algorithm, a target answer is determined according to the included angle cosine value, intention recognition is conducted on the question text clauses through a deep neural network text classification model to obtain a template type, the question text clauses, the target answer and preset links are combined according to the template type, intelligent reply is conducted on spliced contents through the preset crawler task, and the preset links are used for indicating a target user to access a target online question-answering system. According to the embodiment of the invention, the crawling task is used for regularly grabbing the questions from the target forum, the intelligent question answering engine is used for judging and answering the grabbing questions, the crawling task is used for conducting the reply drainage according to the answer content, the user is led to the target page for conducting the intelligent question answering, and the timeliness, the accuracy and the efficiency of the reply drainage to the user are improved.
Drawings
FIG. 1 is a schematic diagram of one embodiment of an artificial intelligence based intention recognition method in an embodiment of the invention;
FIG. 2 is a schematic diagram of another embodiment of an artificial intelligence based intention recognition method in an embodiment of the invention;
FIG. 3 is a schematic diagram of an embodiment of an artificial intelligence based intention recognition device in an embodiment of the invention;
FIG. 4 is a schematic diagram of another embodiment of an artificial intelligence based intent recognition device in accordance with an embodiment of the invention;
FIG. 5 is a schematic diagram of one embodiment of an artificial intelligence based intent recognition device in an embodiment of the invention.
Detailed Description
The embodiment of the invention provides an artificial intelligence-based intention recognition method, device, equipment and storage medium, which are used for regularly grabbing questions from a target forum through a crawler task, judging and answering the grabbed questions through an intelligent question-answering engine, conducting copyback drainage according to answer contents through the crawler task, conducting intelligent question-answering on users by guiding the users to the target page, and improving the timeliness, accuracy and drainage efficiency of the copyback.
In order to enable those skilled in the art to better understand the present invention, embodiments of the present invention will be described below with reference to the accompanying drawings.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a specific flow of an embodiment of the present invention is described below with reference to fig. 1, where an embodiment of an artificial intelligence-based intent recognition method according to the embodiment of the present invention includes:
101. Initial data are collected from a preset forum website at fixed time;
It will be appreciated that the execution subject of the present invention may be an artificial intelligence based intention recognition device, or may be a terminal or a server, and is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
The server collects initial data from a preset forum website at fixed time, wherein the initial data comprises replies and comment contents of various forum sections in the preset forum website aiming at posting topics and topic stickers, and the initial data is expressed in a hypertext markup language code mode.
Further, the server determines a uniform resource locator address of a preset forum website, the server accesses the preset forum website at regular time through a preset crawler task and the uniform resource locator address of the preset forum website to obtain webpage data, for example, the server accesses the uniform resource locator address A of the preset forum website every 1 hour or every 2 hours in the early morning through the preset crawler task, the server intercepts the webpage data according to a preset page identifier to obtain initial data, the initial data comprises an escape character, a space, a webpage label and webpage content, for example, the webpage label comprises < html >, < head >, < body >, < div >, < br >, the server regularly intercepts the webpage data according to a regular expression to obtain the initial data, and the server records the initial data and the uniform resource locator address of the preset forum website into a preset data table.
102. Performing data preprocessing on the initial data according to preset service to obtain a problem text clause;
and the server performs data preprocessing on the initial data according to the preset service to obtain a question text clause. The data extraction refers to the process that a server extracts data from initial data according to preset service, the data cleaning refers to the process that the extracted data is deleted according to preset rules, and the data conversion refers to the process that the cleaned data is processed to obtain a problem text clause.
The method comprises the steps that when the server monitors that initial data to be processed exist in a preset data table, the server extracts the initial data according to preset service, the server cleans the extracted data, and the server converts the cleaned data to obtain the problem text clause.
103. Calculating an included angle cosine value of the question text clause according to the processed sequence through a word frequency-inverse text frequency index algorithm, and determining a target answer according to the included angle cosine value;
And the server calculates an included angle cosine value of the question text clause according to the processed sequence through a word frequency-inverse text frequency index algorithm, and determines a target answer according to the included angle cosine value. Specifically, the server performs question retrieval and processing on a plurality of question text clauses in parallel through a preset message queue. For example, the server acquires the question text clause including the clause A, the clause B, the clause C and the clause D according to the processed sequence, and sequentially processes the data of the clause A, the clause B, the clause C and the clause D through a preset message queue.
And then, the server calculates an included angle cosine value for the question text clause and the preset question sentence through a word frequency-inverse text frequency index algorithm, and determines a target answer according to the included angle cosine value. Further, the server finds a preset question sentence matched with the question text clause in the set of the existing question answer pairs in the preset question-answering center engine through the included angle cosine value, and sets the preset question sentence matched with the question text clause as a target answer. For example, the calculated included angle cosine value of the server includes 0.96, 0.18 and 0.57, and the server sets a preset question corresponding to the included angle cosine value of 0.96 as the target answer.
104. Carrying out intention recognition on the problem text clause through a deep neural network text classification model to obtain a template type;
And the server carries out intention recognition on the problem text clause through the deep neural network text classification model to obtain the template type. Specifically, the server cuts sentences through a deep neural network text classification model to obtain sentence matrixes (s, d), for example, the sentence matrixes are 7*5, 7 is a sentence length value and 5 is a word mapping vector, the server carries out convolution layer processing on the sentence matrixes to obtain feature vectors with different lengths, for example, the convolution kernel is set to be (2, 3, 4) in order to find 2-gram,3-gram and 4-gram features respectively, the server carries out pooling processing on the feature vectors with different lengths to obtain feature vectors with preset lengths, carries out connection and normalization processing on the feature vectors with preset lengths to obtain the probability of each category, and the server determines matched templates and template types according to the type of the maximum probability.
Optionally, the server performs text similarity matching on the question text clause through a preset intention recognition model to obtain a plurality of similarities, determines the maximum similarity from the plurality of similarities, and determines a matched template and a template type according to the maximum similarity.
105. And combining the question text clause, the target answer and a preset link according to the template type, and intelligently replying the spliced content through a preset crawler task, wherein the preset link is used for indicating the target user to access the target online question-answering system.
The server combines the question text clause, the target answer and the preset link according to the template type, and intelligent replying is carried out on the spliced content through the preset crawler task, wherein the preset link is used for indicating the target user to access the target online question-answering system. The method comprises the steps that a server determines a question-answer template according to a template type, wherein the question-answer template is an alternative character string set according to the template type, the server combines the text content of a question, a target answer and a preset link based on the question-answer template, and the server carries out intelligent posting on spliced content through a preset crawler task. For example, do the spliced content "@ me have had social security and further need to re-buy business insurance? medical expenses of social security reimbursement are limited, reimbursement limit is about 50%, and social security cannot protect a disease, so that the social security needs to be supplemented.
Further, when the target user clicks a preset jump chain from the preset forum, the target user is guided to different scene target online question-answering systems through the logic configuration information index numbers in the links, and online dialogue and automatic marketing are carried out between the target user and the target user through the target online question-answering systems.
According to the embodiment of the invention, the crawling task is used for regularly grabbing the questions from the target forum, the intelligent question answering engine is used for judging and answering the grabbing questions, the crawling task is used for conducting the reply drainage according to the answer content, the user is led to the target page for conducting the intelligent question answering, and the timeliness, the accuracy and the efficiency of the reply drainage to the user are improved.
Referring to fig. 2, another embodiment of an artificial intelligence based intention recognition method according to an embodiment of the present invention includes:
201. Initial data are collected from a preset forum website at fixed time;
The server collects initial data from a preset forum website at fixed time, wherein the initial data comprises replies and comment contents of various forum sections in the preset forum website aiming at posting topics and topic stickers, and the initial data is expressed in a hypertext markup language code mode.
Specifically, first, the server determines a uniform resource locator address of a preset forum website, where the uniform resource locator address is a representation of a location and an access manner of a resource available from the internet, and is an address of a standard resource on the internet. Each file on the internet has a unique url address.
And secondly, the server accesses the uniform resource locator address of the preset forum website at regular time through the preset crawler task to obtain access data. Specifically, the server sends a request to the uniform resource locator address of the preset forum website at regular time through a preset crawler task using a get mode or a post transmitting mode to obtain a return result, and analyzes the return result to obtain webpage data.
And thirdly, the server intercepts the webpage data according to the preset page identification to obtain initial data, wherein the initial data comprises an escape character, a space, a webpage label and webpage content. Wherein the web page data includes an escape character "\", a space, a web page tag and specific contents displayed in the web page, the web page tag includes < html >, < head >, < body >, < div >, < br >. The method comprises the steps that a server sets different preset page identifications according to different webpage data and sets first regular expressions for the different preset page identifications, and the server regularly intercepts the webpage data according to the first regular expressions to obtain initial data, wherein the regular expressions are special character sequences and are used for checking whether a character string is matched with a certain pattern or not.
Finally, the server records the initial data and the uniform resource locator address of the preset forum website into a preset data table.
202. Performing data preprocessing on the initial data according to preset service to obtain a problem text clause;
and the server performs data preprocessing on the initial data according to the preset service to obtain a question text clause. The data extraction refers to the process that a server extracts data from initial data according to preset service, the data cleaning refers to the process that the extracted data is deleted according to preset rules, and the data conversion refers to the process that the cleaned data is processed to obtain a problem text clause.
Specifically, first, the server determines keywords of a preset service, and extracts initial data according to the keywords. For example, the server sets the policy as a keyword, and the server extracts partial data related to the policy from the initial data, that is, the server deletes partial data unrelated to the policy from the initial data.
And secondly, deleting the escape symbol and the space of the extracted data by the server in a text processing mode, and deleting the webpage label to obtain text data. The server sets a second regular expression for the escape symbol, the space and the webpage label, and the server processes the extracted data according to the second regular expression in a text processing mode to obtain text data.
Then, the server processes the clause of the text data and deletes the clause of the empty character string to obtain the question text clause. The method comprises the steps of setting clause separators by a server, carrying out clause processing on text data by the server according to a preset segmentation function and the clause separators to obtain a plurality of clauses, traversing and inquiring the plurality of clauses by the server, and deleting the clauses of an empty character string to obtain a problem text clause. Wherein the clause separator includes a semicolon, a period, an exclamation mark, and a question mark.
203. Preprocessing the question text clauses according to the processed sequence to obtain an initial vocabulary and a first question sentence vector;
The server preprocesses the question text clauses according to the processed sequence to obtain an initial vocabulary and a first question sentence vector. The preprocessing comprises word segmentation, part-of-speech tagging and stop word filtering of preset question sentences and question text clauses. The method comprises the steps of presetting a question sentence and a question text clause, wherein the question sentence and the question text clause comprise sentences, the segmentation is to decompose the sentences into data structures taking words as units, the part-of-speech tagging is to determine a part-of-speech for each word according to the context of the sentences, the part-of-speech comprises nouns, pronouns, numbers, graduated words and adjectives, and the stop word filtering is to filter noise in a word segmentation result, such as, for example, yes and o.
The method comprises the steps that a server writes question text clauses into the tail of a queue of a preset message queue according to the processed sequence, the preset message queue is used for sequentially processing the question text clauses, the server sets time-out time length for the question text clauses, the time-out time length is larger than or equal to 0, the question text clauses are sent to a preset question-answering center engine according to a first-in first-out sequence, when the time-out time length corresponding to the question text clauses is 0 and the question text clauses are still in a waiting state, the server adds the question text clauses into a delay task queue, the delay task queue is used for processing the question text clauses exceeding the time-out time, and when the time-out time length corresponding to the question text clauses is larger than 0, word segmentation and part-of-speech marking are carried out on the question text clauses, and an initial vocabulary and a first question sentence vector are obtained.
It can be understood that the server sets the message-oriented middleware RabbitMQ as a preset message queue, and sets the timeout duration of the message for the question text clause when the server sends the question text clause through the RabbitMQ, wherein the timeout duration of the message is used for indicating and setting the survival duration of the question text clause, the unit of the timeout duration is millisecond, the value of the timeout duration of each message is greater than or equal to 0, the timeout duration of each message can be the same or different, and the method is not limited in particular, and when the question text clause is in a waiting state in the preset timeout duration, the question text clause is sent to the preset question-answering center engine through the RabbitMQ according to a first-in first-out sequence. For example, the server sends 3 question text phrases A, B and C with timeout periods of 10 seconds, 30 seconds, and 15 seconds, respectively. And after the timeout period is over 5 seconds, sequentially sending A, B and C through the RabbitMQ. It should be noted that when using RabbitMQ to implement the delayed task queue, it is ensured that the delay times of A, B and C are consistent.
204. Performing text vectorization on the preset question according to the word frequency-inverse text frequency index algorithm and the initial vocabulary to obtain a second question vector;
The server carries out text vectorization on preset questions according to a word frequency-inverse text frequency index algorithm and an initial vocabulary to obtain second question vectors, and specifically, the server acquires the initial vocabulary which comprises a plurality of vocabularies ,The server reads the number of preset questions from a preset databaseThe server counts a plurality of words according to the word frequency-inverse text frequency index algorithmNumber of occurrences in question text clausesThe server counts a plurality of vocabularies in the preset questionNumber of questions of (1)The server carries out text vectorization on the preset question according to a preset formula to obtain a second question vector, wherein the preset formula is as follows,。
205. Calculating the first question vector and the second question vector to obtain an included angle cosine value, and setting an answer corresponding to a preset question with the largest included angle cosine value as a target answer;
The server calculates the first question vector and the second question vector to obtain an included angle cosine value, and concretely calculates the first question vector and the second question vector to obtain the included angle cosine value, wherein the formula for calculating the included angle cosine value is that Wherein, the method comprises the steps of, wherein,For the first question vector,For the second question vector,For indicating to determine a first question vector according to the cosine value of the included angleAnd a second question vectorFurther, the similarity of the two, further,The server sorts the answers corresponding to the preset questions according to the sequence from the larger value to the smaller value of the cosine of the included angle, and sets the answer corresponding to the preset question with the largest value of the cosine of the included angle as a target answer.
206. Carrying out intention recognition on the problem text clause through a deep neural network text classification model to obtain a template type;
And the server carries out intention recognition on the problem text clause through the deep neural network text classification model to obtain the template type. The deep neural network text classification model comprises a text-CNN model, specifically, a server cuts sentences through the deep neural network text classification model to obtain sentence matrixes (s, d), for example, the sentence matrixes are 7*5, 7 is a sentence length value, 5 is a word mapping vector, the server carries out convolution layer processing on the sentence matrixes to obtain feature vectors with different lengths, for example, the convolution kernel size is set to be (2, 3, 4) or to search 2-gram,3-gram and 4-gram features respectively, the server carries out pooling processing on the feature vectors with different lengths to obtain feature vectors with preset lengths, the server carries out connection and normalization processing on the feature vectors with preset lengths to obtain probability of each category, and the server determines matched templates and template types according to the type of the maximum probability.
Optionally, the server performs text similarity matching on the question text clause through a preset intention recognition model to obtain a plurality of similarities, determines the maximum similarity from the plurality of similarities, and determines a matched template and a template type according to the maximum similarity.
It should be noted that the purpose of the server for intention recognition is to determine whether the question and the answer have marketing value, and if the question and the answer have no marketing value, discard the text clause of the question. For example, for the question text clause "xxx disease cannot buy yyy insurance", when the server performs intent recognition, it needs to determine whether the yyy insurance belongs to the target enterprise, if the yyy insurance belongs to the target enterprise, then determine the matched template a and template type a, and if the yyy insurance does not belong to the target enterprise, then determine the matched template B and template type B.
207. And combining the question text clause, the target answer and a preset link according to the template type, and intelligently replying the spliced content through a preset crawler task, wherein the preset link is used for indicating the target user to access the target online question-answering system.
The server combines the question text clause, the target answer and the preset link according to the template type, and intelligent replying is carried out on the spliced content through the preset crawler task, wherein the preset link is used for indicating the target user to access the target online question-answering system. The method comprises the steps that a server determines a question-answer template according to a template type, wherein the question-answer template is an alternative character string set according to the template type, the server combines a question text clause, a target answer and a preset link based on the question-answer template, and the server carries out intelligent posting on spliced contents through a preset crawler task. For example, do the spliced content "@ me have had social security and further need to re-buy business insurance? medical expenses of social security reimbursement are limited, reimbursement limit is about 50%, and social security cannot protect a disease, so that the social security needs to be supplemented.
Further, when it is detected that the target user accesses the target online question-answering system through the preset link, online conversations are performed with the target user through the target online question-answering system. When a target user clicks a preset jump chain from a preset forum, the target user is guided to different scene target online question-answering systems through logic configuration information index numbers in the links, and online dialogue and automatic marketing are carried out between the target user and the target user through the target online question-answering systems. For example, for insurance topics, online conversations and automatic marketing are conducted with target users through a target online question and answer system.
According to the embodiment of the invention, the crawling task is used for regularly grabbing the questions from the target forum, the intelligent question answering engine is used for judging and answering the grabbing questions, the crawling task is used for conducting the reply drainage according to the answer content, the user is led to the target page for conducting the intelligent question answering, and the timeliness, the accuracy and the efficiency of the reply drainage to the user are improved.
The method for identifying intent based on artificial intelligence in the embodiment of the present invention is described above, and the apparatus for identifying intent based on artificial intelligence in the embodiment of the present invention is described below, referring to fig. 3, an embodiment of the apparatus for identifying intent based on artificial intelligence in the embodiment of the present invention includes:
the acquisition unit 301 is configured to acquire initial data from a preset forum website at regular time;
a preprocessing unit 302, configured to perform data preprocessing on the initial data according to a preset service, so as to obtain a question text clause;
A calculating unit 303, configured to calculate an included angle cosine value for the question text clause and a preset question sentence according to the processed sequence through a word frequency-inverse text frequency index algorithm, and determine a target answer according to the included angle cosine value;
The intention recognition unit 304 is configured to perform intention recognition on the question text clause through the deep neural network text classification model to obtain a template type;
and the reply unit 305 is used for combining the question text clause, the target answer and the preset link according to the template type, intelligently replying the spliced content through the preset crawler task, and indicating the target user to access the target online question-answering system.
According to the embodiment of the invention, the crawling task is used for regularly grabbing the questions from the target forum, the intelligent question answering engine is used for judging and answering the grabbing questions, the crawling task is used for conducting the reply drainage according to the answer content, the user is led to the target page for conducting the intelligent question answering, and the timeliness, the accuracy and the efficiency of the reply drainage to the user are improved.
Referring to fig. 4, another embodiment of an artificial intelligence based intention recognition apparatus according to an embodiment of the present invention includes:
the acquisition unit 301 is configured to acquire initial data from a preset forum website at regular time;
a preprocessing unit 302, configured to perform data preprocessing on the initial data according to a preset service, so as to obtain a question text clause;
A calculating unit 303, configured to calculate an included angle cosine value for the question text clause and a preset question sentence according to the processed sequence through a word frequency-inverse text frequency index algorithm, and determine a target answer according to the included angle cosine value;
The intention recognition unit 304 is configured to perform intention recognition on the question text clause through the deep neural network text classification model to obtain a template type;
and the reply unit 305 is used for combining the question text clause, the target answer and the preset link according to the template type, intelligently replying the spliced content through the preset crawler task, and indicating the target user to access the target online question-answering system.
Optionally, the acquisition unit 301 may be further specifically configured to:
determining a uniform resource locator address of a preset forum website;
The preset forum website is accessed at regular time through a preset crawler task and a uniform resource locator address of the preset forum website, and webpage data are obtained;
Intercepting webpage data according to preset page identification to obtain initial data, wherein the initial data comprises an escape character, a space, a webpage label and webpage content;
And recording the initial data and the uniform resource locator address of the preset forum website into a preset data table.
Optionally, the preprocessing unit 302 may be further specifically configured to:
Determining keywords of preset services, and extracting initial data according to the keywords;
deleting the escape symbol and the blank space from the extracted data in a text processing mode, and deleting the webpage label to obtain text data;
And carrying out clause processing on the text data, and deleting the clause of the empty character string to obtain the problem text clause.
Optionally, the computing unit 303 may further include:
a preprocessing subunit 3031, configured to preprocess the question text clause according to the processed sequence, to obtain an initial vocabulary and a first question vector;
the text processing subunit 3032 is configured to perform text vectorization on the preset question according to the word frequency-inverse text frequency index algorithm and the initial vocabulary to obtain a second question vector;
The calculating subunit 3033 is configured to calculate the first question vector and the second question vector to obtain an included angle cosine value, and set an answer corresponding to a preset question with the largest included angle cosine value as the target answer.
Optionally, the preprocessing subunit 3031 is further specifically configured to:
writing a plurality of problem text sentences into the tail part of a preset message queue according to the processed sequence;
setting time-out time length for the question text clause, wherein the time-out time length is greater than or equal to 0;
When the timeout duration corresponding to the question text clause is equal to 0 and the question text clause is still in a waiting state, adding the question text clause into a delay task queue, wherein the delay task queue is used for processing the overtime question text clause;
When the timeout duration corresponding to the question text clause is greater than 0, word segmentation and part-of-speech tagging are carried out on the question text clause, and an initial vocabulary and a first question sentence vector are obtained.
Optionally, the text processing subunit 3032 is further specifically configured to:
Obtaining an initial vocabulary, the initial vocabulary including a plurality of vocabularies ,Is a positive integer;
reading the number of preset questions from a preset database ;
Counting multiple vocabularies according to word frequency-inverse text frequency index algorithmNumber of occurrences in question text clauses;
The statistical preset question includes a plurality of wordsNumber of questions of (1);
Text vectorization is carried out on the preset question according to a first preset formula to obtain a second question vector, wherein the first preset formula,。
Optionally, the calculating subunit 3033 is further specifically configured to:
setting the first question vector to ;
Calculating the first question vector and the second question vector according to a second preset formula to obtain an included angle cosine value, wherein the second preset formula is thatWherein, the method comprises the steps of, wherein,For the first question vector,For the second question vector,For indicating to determine a first question vector according to the cosine value of the included angleAnd a second question vectorSimilarity of (2);
And ordering the answers corresponding to the preset questions according to the order of the cosine values of the included angles from large to small, and setting the answer corresponding to the preset question with the largest cosine value of the included angle as a target answer.
According to the embodiment of the invention, the crawling task is used for regularly grabbing the questions from the target forum, the intelligent question answering engine is used for judging and answering the grabbing questions, the crawling task is used for conducting the reply drainage according to the answer content, the user is led to the target page for conducting the intelligent question answering, and the timeliness, the accuracy and the efficiency of the reply drainage to the user are improved.
Fig. 3 and fig. 4 above describe the artificial intelligence-based intention recognition apparatus in the embodiment of the present invention in detail from the point of view of the modularized functional entity, and the artificial intelligence-based intention recognition device in the embodiment of the present invention is described in detail from the point of view of hardware processing below.
Fig. 5 is a schematic diagram of an artificial intelligence based intent recognition device 500 according to an embodiment of the present invention, which may vary considerably in configuration or performance, and may include one or more processors (central processing units, CPU) 501 (e.g., one or more processors) and memory 509, one or more storage mediums 508 (e.g., one or more mass storage devices) storing applications 507 or data 506. Wherein the memory 509 and storage medium 508 may be transitory or persistent storage. The program stored on the storage medium 508 may include one or more modules (not shown), each of which may include a series of instruction operations in an artificial intelligence-based intent recognition device. Still further, the processor 501 may be configured to communicate with the storage medium 508 and execute a series of instruction operations in the storage medium 508 on the artificial intelligence based intent recognition device 500.
The artificial intelligence based intent recognition device 500 may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input/output interfaces 504, and/or one or more operating systems 505, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the artificial intelligence based intent recognition device architecture shown in FIG. 5 does not constitute a limitation of the artificial intelligence based intent recognition device, and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
While the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that the foregoing embodiments may be modified or equivalents may be substituted for some of the features thereof, and that the modifications or substitutions do not depart from the spirit and scope of the embodiments of the invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010162325.5A CN111428471B (en) | 2020-03-10 | 2020-03-10 | Intention recognition method, device, equipment and storage medium based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010162325.5A CN111428471B (en) | 2020-03-10 | 2020-03-10 | Intention recognition method, device, equipment and storage medium based on artificial intelligence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111428471A CN111428471A (en) | 2020-07-17 |
CN111428471B true CN111428471B (en) | 2025-04-11 |
Family
ID=71551540
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010162325.5A Active CN111428471B (en) | 2020-03-10 | 2020-03-10 | Intention recognition method, device, equipment and storage medium based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428471B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114860934A (en) * | 2022-05-09 | 2022-08-05 | 青岛日日顺乐信云科技有限公司 | A Smart Question Answering Method Based on NLP Technology |
CN115936011B (en) * | 2022-12-28 | 2023-10-20 | 南京易米云通网络科技有限公司 | Multi-intention semantic recognition method in intelligent dialogue |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522393A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Intelligent answer method, apparatus, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7392185B2 (en) * | 1999-11-12 | 2008-06-24 | Phoenix Solutions, Inc. | Speech based learning/training system using semantic decoding |
CN108446286B (en) * | 2017-02-16 | 2023-04-25 | 阿里巴巴集团控股有限公司 | Method, device and server for generating natural language question answers |
-
2020
- 2020-03-10 CN CN202010162325.5A patent/CN111428471B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522393A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Intelligent answer method, apparatus, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111428471A (en) | 2020-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110968684B (en) | Information processing method, device, equipment and storage medium | |
US8161059B2 (en) | Method and apparatus for collecting entity aliases | |
CN102054015B (en) | System and method for organizing community intelligence information using an organic object data model | |
CN114238573A (en) | Information pushing method and device based on text countermeasure sample | |
US20110112995A1 (en) | Systems and methods for organizing collective social intelligence information using an organic object data model | |
CA2774278C (en) | Methods and systems for extracting keyphrases from natural text for search engine indexing | |
CN110543595B (en) | In-station searching system and method | |
CN108959531B (en) | Information search method, device, device and storage medium | |
CN109241277B (en) | Text vector weighting method and system based on news keywords | |
CN103823824A (en) | Method and system for automatically constructing text classification corpus by aid of internet | |
CN107885793A (en) | A kind of hot microblog topic analyzing and predicting method and system | |
CN103810251B (en) | Method and device for extracting text | |
CN111160019A (en) | Public opinion monitoring method, device and system | |
CN111563382A (en) | Text information acquisition method and device, storage medium and computer equipment | |
CN111428471B (en) | Intention recognition method, device, equipment and storage medium based on artificial intelligence | |
CN109948154B (en) | A system and method for character acquisition and relationship recommendation based on mailbox name | |
US8862586B2 (en) | Document analysis system | |
CN118535978A (en) | News analysis method and system based on multi-mode large model | |
CN112035723A (en) | Resource library determination method and device, storage medium and electronic device | |
CN108446333B (en) | Big data text mining processing system and method thereof | |
CN115640439A (en) | Method, system and storage medium for network public opinion monitoring | |
CN116070024A (en) | Article Recommendation Method and Device Based on New Energy Cloud and User Behavior | |
CN109902230A (en) | Method and device for processing news data | |
Rousseau | Graph-of-words: mining and retrieving text with networks of features | |
CN109597879B (en) | Service behavior relation extraction method and device based on 'citation relation' data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |