Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As known from the background art, the method has high technical requirements for developers and low webpage development efficiency by manually writing webpage codes by using a Web front-end language to build and typeset webpage contents and styles. Therefore, how to improve the efficiency of web page development is a problem to be solved in the present application.
In order to solve the problems, the application discloses an online webpage generating method and device based on document processing, which are used for carrying out intelligent analysis and conversion processing on a document to be converted into a webpage, automatically extracting key information such as document content, structure, typesetting, style and the like, converting the key information to obtain document content characteristics, automatically recommending an optimal webpage template with highest similarity score through a pre-trained webpage template recommending engine and the document content characteristics, and realizing intelligent mapping of the document to the webpage, so that non-technicians can also quickly manufacture professional webpages, thereby reducing threshold of webpage development, webpage manufacturing difficulty and labor cost, and improving webpage generating efficiency. The specific implementation is illustrated by the following examples.
Referring to fig. 1, a schematic structural diagram of a web page online generation system according to an embodiment of the present application is shown, where the web page online generation system includes a document uploading module, a document parsing module, a web page template matching module, a web page content intelligent filling module, a web page intelligent layout optimizing module, a web page personalized customizing module, a web page intelligent testing and optimizing module, and a web page publishing and intelligent popularizing module.
The data interaction process among the document uploading module, the document analyzing module, the webpage template matching module, the webpage content intelligent filling module, the webpage intelligent layout optimizing module, the webpage personalized customizing module, the webpage intelligent testing and optimizing module and the webpage publishing and intelligent popularization module is as follows:
(1) An intelligent document analysis module:
The method comprises the steps of obtaining a Web interface provided by a user through a Web page online generation system, selecting a document to be converted into a Web page, and uploading the document to be converted to a server, wherein the Web page online generation system supports uploading documents in common formats such as Word, slide (PPT), portable file format (Portable Document Format, PDF) and the like. After the uploading server is completed, the document is stored under the appointed directory of the server. The document to be converted is a document that needs to be converted into a web page.
And the webpage online generation system performs intelligent analysis on the uploaded document to be converted by using a pre-trained document analysis deep learning model to obtain key information.
The document analysis deep learning model adopts neural network structures such as a neural network model convolutional neural network (Convolutional Neural Networks, CNN) and the like, and can accurately identify elements such as content, structure, format and the like of a document through training a large number of document samples. And simultaneously, a natural language processing technology is introduced to perform semantic understanding and key information extraction on text contents. The core code of the parsing engine of the intelligent document parsing module is exemplified as follows:
It should be noted that the key information specifically includes the subject and core content of the document, the keywords and entities (such as person name, place name, organization name, etc.) of the document, the abstract or summary of the document, the chapter structure and hierarchical relationship of the document, important sentences or paragraphs in the document, key data and facts in the document, emotional tendency and viewpoint attitude of the document, etc.
The parsing result refers to the output of the document parsing model, including:
content: the text content of the document, i.e. the main body part;
structure: structural information of the document, such as hierarchical relationship and position information of chapters, paragraphs, lists and the like;
format: format information of the document, such as style information of fonts, word sizes, colors, alignment modes and the like.
Post-processing the analysis result means that after the original analysis output is obtained, the analysis result is further processed and refined to meet the application requirement. Specifically, the following aspects may be included:
Carrying out grammar analysis, syntax analysis and semantic analysis on text content of content, and extracting key information and semantic structures such as named entities, keywords, abstract, emotion and the like;
integrating and optimizing structural information of the structure to generate a more standard and easy-to-use document outline and directory structure;
and extracting and converting format information of formats, and mapping the format information into webpage formats such as HTML and CSS so as to adapt to the requirements of webpage display.
Natural language processing technology is introduced, and the processes of semantic understanding and key information extraction on text content are shown as A1-A7.
A1: preprocessing a document to be converted to obtain a document to be converted in a preset format; the pretreatment at least comprises cleaning, word segmentation and word deactivation.
In A1, preprocessing operations such as cleaning, word segmentation, word deactivation and the like are carried out on the document to be converted, and the document to be converted is converted into a structured data format which can be understood and processed by a computer.
A2: and identifying key entities of the document to be converted through the named entity identification model.
In A2, key entities such as person names, place names, organization names and the like in the text are identified by using a named entity identification model BiLSTM-CRF.
A3: and extracting keywords from the document to be converted by using a preset algorithm.
Among them, the preset algorithm includes, but is not limited to, a conventional weighting technique (Term Frequency-Inverse Document Frequency, TF-IDF) algorithm for information retrieval and data mining. The preset algorithm of the scheme is preferably a TF-IDF algorithm.
And extracting important keywords from the document to be converted by using a TF-IDF algorithm, wherein the keywords are used for reflecting the theme and core content of the document.
A4: and generating a summary of the document to be converted through a text summary model.
Text summary models include, but are not limited to, a transducer model.
In A4, a converter model is utilized to automatically generate a summary or a summary of the text, and key information of the document to be converted is extracted.
A5: and identifying the semantic roles of the document to be converted through the semantic role annotation model.
Wherein the semantic role annotation model includes, but is not limited to, a BERT-CRF model. The semantic role labeling model of the scheme is preferably a BERT-CRF model.
Semantic roles in the text, such as agent, events, time, places, etc., are identified using the BERT-CRF model, and the semantic structure of the text is understood.
A6: and determining the emotion tendencies of the document to be converted through an emotion analysis model.
Emotion analysis models include, but are not limited to, long short-term memory network (LSTM) models. The emotion analysis model of the present embodiment is preferably an LSTM model.
And judging the emotion tendencies of the document to be converted, such as positive, negative, neutral and the like, by utilizing the LSTM model, so as to know the views and attitudes of the authors.
A7: and extracting key information from the key entities, the key words, the abstract, the semantic roles and the emotion tendencies.
In A7, according to the application scenario and the requirement, the most important key information, such as key entity, key word, abstract, emotion tendency, and the like, is extracted from the analysis results (key entity, key word, abstract, semantic role, emotion tendency, and the like, so as to form a structured text representation.
(2) The intelligent webpage template matching module:
The webpage online generation system converts the key information to obtain document content characteristics, and intelligently recommends an optimal target webpage template based on a pre-trained webpage template recommendation engine according to the document content characteristics extracted by analysis. The target webpage template is the optimal webpage template with the highest similarity score between the document content characteristics and the template characteristics.
Analyzing the document to be converted uploaded by the user, and extracting key document content characteristics such as a theme, keywords, chapter structures, data tables, pictures and the like of the document.
The process of obtaining the target webpage template through the pre-trained webpage template recommendation engine and the document content characteristics is shown as B1-B7.
B1: the document content features are converted into normalized feature vector representations.
It should be noted that the content features of the document are further extracted and converted based on the key information, and they represent the content elements of the document in a more structured and vectorized manner, thus providing directly available feature input for intelligent template recommendation.
The document content features are an application form and a representation mode of key information, and the extraction of the key information is a basis and a premise of the document content feature construction. The intelligent webpage generating system and the intelligent webpage generating system supplement each other, and document understanding and template recommending functions of the intelligent webpage generating system are supported together.
The document content features mainly comprise theme features, keyword features, named entity features, emotion features, structural features, semantic features and the like.
Theme characteristics: the topic category to which the document belongs.
Keyword features: important keywords in a document.
Named entity characteristics: name of person, place name, organization name, etc. appearing in the document.
Emotion characteristics: emotional tendency of a document.
The structural characteristics are as follows: chapter structure, paragraph layout, chart position, etc. of the document.
Semantic features: semantic vector representations of documents.
B2: and carrying out similarity calculation on the standardized feature vector representation and the template feature vectors in the template feature library to obtain the similarity of the current document.
The template feature vector comprises a visual feature vector, text content features, label features and the like of the template.
The visual feature vector mainly reflects the visual appearance of the template, such as layout structure, color scheme, picture style, etc.
The text content features and the tag features reflect the semantic topic and key information of the template.
The webpage online generation system maps the features with different dimensions into a unified high-dimensional vector space through a deep learning model to form a complete template feature vector representation.
B3: and carrying out personalized recommendation of the template through a collaborative filtering algorithm, the similarity of the user history selection template and the current document to obtain a collaborative filtering result.
And adopting a collaborative filtering algorithm to carry out personalized recommendation on the template based on the similarity between the template selected by the user history and the current document. Collaborative filtering algorithms may discover implicit preferences of a user, recommending templates that are similar to historical selections or selected by other users.
B4: and carrying out semantic matching on the document content characteristics and the template characteristic vectors through CNN in the deep learning to obtain a deep learning result.
The CNN can learn the complex nonlinear relation between the document and the template, and a more accurate deep learning result is given.
B5: and optimizing a recommendation algorithm of a web page template recommendation engine through reinforcement learning technology and feedback of a preset recommendation template to obtain reinforcement learning results.
The preset recommendation template feedback is feedback of a user on the recommendation template, and the feedback of the user on the recommendation template comprises clicking, application, modification and other actions.
And introducing reinforcement learning technology, and dynamically adjusting the recommendation strategy according to feedback (such as clicking, application, modification and other actions) of a user on the recommendation template. The reinforcement learning can continuously optimize the recommendation algorithm, and the user satisfaction and the template application rate are improved.
B6: and obtaining a template recommendation list according to the collaborative filtering result, the deep learning result and the reinforcement learning result.
And combining the collaborative filtering result, the deep learning result and the reinforcement learning result, giving a final template recommendation list, and sequencing templates in the template recommendation list according to the relevance and the user preference for selection and application by a user.
B7: and selecting the target webpage template with the highest similarity score between the document content characteristics and the template characteristics from the template recommendation list.
Through the process, the webpage online generation system can intelligently recommend an optimal target webpage template according to the document content characteristics and by combining the historical behaviors of the user with real-time feedback, and the accuracy and the efficiency of template matching are improved.
The recommendation engine adopts a method combining collaborative filtering and deep learning, and a personalized template matching result is given out by analyzing user behaviors and template characteristics. Meanwhile, a reinforcement learning technology is introduced, and a recommendation strategy is dynamically optimized through feedback of a user. The specific flow is shown in 2.1-2.7:
2.1, constructing a template feature library:
First, the online webpage generating system needs to analyze and process the existing webpage template library. Each template has unique characteristics of content structure, layout style, color collocation and the like. The webpage online generation system uses CNN in deep learning to extract the characteristics of the screenshot of the template to obtain the visual characteristic vector of the template, and simultaneously, uses natural language processing technology to perform semantic analysis on the text content and the label of the template to extract the key words and the theme characteristics, and finally, each template is expressed as a high-dimensional characteristic vector and is stored in a template characteristic library.
The text content of the template comprises text elements such as titles, paragraphs, lists, tables, links and the like in the template. These text contents embody the main information and semantic structure of the template.
The labels refer to keyword descriptions such as 'concise atmosphere', 'business general', 'product display' and the like on the aspects of semantic subjects, style characteristics, applicable scenes and the like of the templates. These labels are typically added manually by the template designer or user, reflecting the nature and purpose of the template. The act of the user selecting the application template may also be used as an implicit tag feedback.
The text content and the labels of the template are subjected to semantic analysis by utilizing a natural language processing technology, and the process of extracting the keywords and the theme features comprises text preprocessing, theme modeling, label analysis, feature fusion and feature storage:
text preprocessing: and performing preprocessing operations such as word segmentation, stop word removal, part-of-speech tagging and the like on the text content of the template, and converting the text into a structured data format.
Keyword extraction: and identifying important keywords from the text by adopting a TF-IDF keyword extraction algorithm, and reflecting the core content of the template.
Theme modeling: and (3) mapping the text to a semantic topic space by adopting a topic model algorithm (LATENT DIRICHLET Allocation, LDA) and extracting topic distribution characteristics of the template.
Tag analysis: and performing word segmentation and semantic vectorization on the labels of the templates, and extracting key words and semantic features of the labels.
Feature fusion: and fusing the extracted keyword features, the theme features and the label features to generate a unified text semantic feature vector.
And (3) feature storage: and storing the text semantic feature vectors into a template feature library, and forming a complete template feature representation together with the visual feature vectors.
Through the steps, keywords and theme features rich in semantic information can be extracted from text contents and labels of the templates, and important semantic clues are provided for subsequent template matching and recommendation.
2.2 User behavior data collection and analysis:
the web page online generation system records various behavior data of the user on the platform, such as browsing, clicking, collecting, using and the like. Through analysis of these behavioral data, the user's preferences, habits and needs can be understood. The system adopts collaborative filtering algorithm, finds out the template use preference of other similar users according to the historical behaviors of the users, and takes the template use preference as the reference of recommendation.
2.3, Constructing a recommendation model:
The webpage online generation system comprehensively utilizes template characteristics, user behavior data and document content characteristics to construct an intelligent recommendation model. The model adopts a neural network structure in deep learning, and mainly comprises the following parts:
input layer: document content features, user features, and template feature data are received.
Embedding (Embedding) layer: the discrete features are vectorized and mapped into a continuous low-dimensional space.
Fusion layer: and fusing the characteristics of the document, the user and the template by using a fully-connected neural network to generate a comprehensive characteristic representation.
Matching layer: and calculating the matching degree score of the comprehensive features and each template in the template feature library through measurement modes such as cosine similarity and the like.
Output layer: and selecting Top-N templates with highest matching degree scores as recommendation results.
In the recommendation system, the input feature and the target feature are generally distinguished.
Input features refer to independent variable features for training and prediction, including user features, item features, context features, and the like. The template feature data forms a complete input feature space together with the document content features and the user features as part of the input layer.
The target feature refers to a target variable to be predicted by the recommendation system, and is usually a behavior label of scoring, clicking, purchasing and the like of the item by the user. In the template recommendation scenario, the target feature may be whether the user has applied a certain template, or a user satisfaction score with the template, etc.
Template feature data received by the input layer, and original feature representations extracted from a template feature library, such as visual feature vectors, text content feature vectors, label feature vectors and the like of the template. These raw feature vectors may be highly dimensional, redundant in information, and not directly used for training and prediction of the recommendation model. Therefore, in building the recommendation model, further processing and conversion of the raw feature data received by the input layer is typically required to extract the more compact and informative feature representations as inputs to the neural network. This process can be implemented in several ways (feature selection, feature transformation, feature embedding, feature crossing):
Feature selection: and selecting the most relevant feature subset with the most distinguishing degree from the original high-dimensional feature space, reducing feature dimension and noise.
Feature transformation: and the original characteristics are subjected to numerical transformation such as normalization, standardization, discretization and the like, so that the characteristic distribution is smoother, and model learning is facilitated.
Feature embedding: the embedding layers are utilized to map the discrete features with high-dimensional sparsity into a continuous vector space with low-dimensional dense, and semantic association among the features is mined.
Feature crossover: and constructing high-order interaction features by means of feature combination, feature product and the like, and capturing nonlinear relations among the features.
After the feature engineering operations, an optimized low-dimensional dense feature representation is obtained, is taken as the actual input of the neural network, and participates in training and optimizing the model together with the target feature.
It should be noted that, the template feature data received by the input layer may be regarded as an original form of the target feature vector, but a series of feature processing and conversion are required to be performed to become the input feature vector finally used for recommendation model training and prediction. The process is an important link in the construction of a recommendation system, and the performance and effect of the model are directly affected.
2.4 Model training and optimization:
The training of the recommendation model adopts a supervised learning mode, takes the actual selection template of the user as a label, takes the comprehensive characteristics as input, continuously optimizes model parameters through a back propagation algorithm, and minimizes prediction errors. Meanwhile, a reward mechanism in reinforcement learning is introduced, feedback (such as clicking, using and the like) of a user is used as a reward signal, and a recommendation strategy of the model is dynamically adjusted, so that the recommendation strategy can adapt to the continuously changing requirements of the user.
2.5 Online recommendation service:
the trained recommendation model is deployed as an online service, and a document analysis result and a user request are received in real time. When a user uploads a document, the webpage online generation system extracts document content characteristics, combines the characteristics of the user, inputs the characteristics into a recommendation model, and rapidly generates a personalized template recommendation result.
The document content features comprise theme features, keyword features, named entity features, emotion features, structural features, semantic features and the like.
Theme characteristics: the topic categories to which the document belongs, such as news, technology, entertainment, etc., can be extracted using topic models such as LDA.
Keyword features: important keywords in the document reflect the core content of the document and can be extracted by using TF-IDF, a graph-based ranking algorithm (TextRank) for text and other algorithms.
Named entity characteristics: named entities such as person names, place names, mechanism names and the like appearing in the documents can be extracted by using a named entity recognition model.
Emotion characteristics: emotional tendency of a document, such as positive, negative, neutral, etc., can be extracted using an emotion analysis model.
The structural characteristics are as follows: the structural information such as chapter structure, paragraph layout, chart position and the like of the document can be extracted by using a document analysis technology.
Semantic features: the semantic vector representation of the document captures the semantic information of the document and can be extracted by a word vector, a sentence vector, a document vector and other representation learning methods.
The document content features describe the content attributes of the document from different angles, can be used as important input of a recommendation model, help the model understand the document content, and give more accurate template recommendation.
In order to facilitate understanding that the online webpage generating system extracts document content characteristics, the characteristics of the user are combined, the document content characteristics are input into a recommendation model, and a personalized template recommendation result is quickly generated, and the process is exemplified here:
for example, assuming that user a has uploaded a technical blog document, the web page online generation system first parses and extracts features from the document:
Theme characteristics: the topics of the extracted documents are deep learning, computer vision and the like;
keyword features: the keywords of the extracted document are convolutional neural network, image classification, data set and the like;
Named entity characteristics: extracting the names of students, data set names, algorithm names and the like mentioned in the document;
Emotion characteristics: analyzing the mood and emotion tendencies of the document to obtain neutral positive emotion values;
The structural characteristics are as follows: extracting structural information such as chapter levels, paragraph numbers, chart positions and the like of the document;
semantic features: the document is converted into a semantic vector representation, such as a document vector generated by Doc2 Vec.
At the same time, the system also extracts the following features of user a:
history browsing record: technical documents, courses, papers, etc. previously browsed by user a;
historical application templates: user A selects the type and style of the applied webpage template before;
Personal attributes: registration information such as occupation, academic, age, etc. of the user a;
the user characteristics and the document content characteristics are input into a recommendation model, and the model obtains the interest prediction scores of the user A on different templates through forward calculation, such as:
template 1 (succinct technical wind): 0.85;
template 2 (vivid graphic style): 0.62;
template 3 (professional literature wind): 0.78;
...;
according to the prediction score ranking, the system selects N templates with highest scores to generate a personalized recommendation list:
[ template 1, template 3, template 2. ];
The recommendation result comprehensively considers the historical preference of the user A and the content characteristics of the current document, and gives out the best matched template recommendation. The user A can select the most satisfactory template from the recommendation list to apply and edit;
The process shows an end-to-end personalized template recommendation flow, and by extracting document content characteristics and user characteristics and utilizing an intelligent recommendation model, an optimal webpage template is automatically matched for a user, so that the efficiency and quality of document-to-webpage conversion are improved.
2.6 Sequencing and displaying the recommended results:
The recommendation model gives a template list, which is ordered according to the matching degree score. The system displays the top-ranked templates to the user in a friendly mode, such as a card-type layout of the drawing and the metallocene, so that the user can browse and select conveniently.
2.7 User feedback and model update:
the user feedback (such as clicking, collecting, using, etc.) of the recommended results is recorded and used as new behavior data to be supplemented into the user behavior database. Meanwhile, the feedback data are also used for continuously optimizing the recommendation model, and the model can be continuously evolved to adapt to new user demands and preference changes through an incremental training or online learning mode.
The core code of the template matching is exemplified as follows:
(3) And the intelligent webpage content filling module is used for:
And the webpage content intelligent filling module automatically fills the document content data obtained by analysis into the target webpage template and generates a corresponding initial webpage. The specific process of generating the corresponding initial web page is shown as C1-C7.
C1: traversing the HTML code of the target web page template.
C2: various placeholders of the target web page template are identified by the hypertext markup language code.
Wherein, each type of placeholder comprises a title placeholder, a paragraph placeholder and the like.
And C3: various types of content elements of the document content data in the key information are determined.
In C3, the various types of content elements include different types of content elements such as titles, paragraphs, lists, pictures, etc. of the document.
And C4: matching various content elements with various placeholders through preset matching rules; the preset matching rule is a rule for searching corresponding (i.e. similar) content elements according to the type of the placeholder;
In C4, matching the content elements in the document parsing result with placeholders in the templates. The principle of matching is to find the most similar content elements based on the type and properties of placeholders. For example, a title placeholder may match a title element in a document, a paragraph placeholder may match a paragraph element, a picture placeholder may match a picture element, and so on. In the matching process, the system also considers the sequence, length, format and other factors of the content elements so as to ensure the rationality of filling.
C5: intelligent rewriting and optimizing of texts corresponding to the matched various content elements; intelligent rewriting is used for enabling texts to be concise and smooth; optimizing is used for improving the efficiency of browsing and information acquisition of web pages.
In the filling process, the text content is intelligently rewritten and optimized by utilizing a natural language generation technology, so that the text content is more in line with the language style and the expression habit of the webpage, and the specific details are shown as 3.1-3.7:
3.1 structured representation of the results of document parsing:
The results of parsing the document to be converted are typically stored in JSON structured data format. These data contain different types of content elements such as titles, paragraphs, lists, pictures, etc. of the document. Each content element has its corresponding properties, such as text content, format, style, etc. The webpage online generation system can read the structured data into the memory to construct a tree-shaped data structure, so that subsequent operation is convenient.
3.2 Content placeholder identification of web page templates:
In the target web page template, some content placeholders are typically reserved for indicating filling locations for different types of content. These placeholders may be special HTML tags, CSS class names, or custom tags. The system will traverse the HTML code of the template, identify these placeholders, and record their location and type information.
3.3 Matching of content elements with placeholders:
And the webpage online generation system matches content elements in the document analysis result with placeholders in the target webpage template. The principle of matching is to find the most similar content elements based on the type and properties of placeholders. For example, a title placeholder may match a title element in a document, a paragraph placeholder may match a paragraph element, a picture placeholder may match a picture element, and so on. In the matching process, the system also considers the sequence, length, format and other factors of the content elements so as to ensure the rationality of filling.
3.4 Intelligent overwriting and optimization of content:
The web page online generation system intelligently rewrites and optimizes the content elements before filling them into placeholders in the target web page template. This mainly uses natural language generation techniques, including the following steps:
3.4.1 text analysis: and carrying out grammar analysis, syntax analysis and semantic analysis on the text of the content element, and extracting key information and semantic structures.
3.4.2 Text rewrite: and (5) carrying out proper rewriting on the text content according to the language style and the expression habit of the webpage. This may include vocabulary replacement, sentence pattern adjustment, intonation conversion, etc. The purpose of the rewriting is to make the text more concise, smooth and readable.
3.4.3 Text optimization: and further optimizing the text according to the characteristics of webpage reading. Such as shortening paragraph length, increasing subtitles, highlighting keywords, etc. The optimized text is more suitable for quick browsing and information acquisition of the webpage.
3.5 Intelligent typesetting and Format adjustment of content:
the filled content also needs to be intelligently typeset and format-adjusted to adapt to the layout and design style of the web page. This mainly utilizes CSS and JavaScript techniques, including the following:
adaptive layout: and dynamically adjusting the size and the position of the content according to the length of the content and the layout of the template, so as to realize the self-adaptive layout effect.
Style optimization: appropriate CSS styles, such as font, color, spacing, etc., are applied to the content elements to coordinate with the design style of the template.
Interaction effect: and adding some interaction effects such as hovering, clicking, animation and the like to the content, so as to promote interactivity and attraction of the webpage.
Preview and confirmation of content filling results:
3.6 after the content filling is completed, the system can generate a webpage preview for the user to check and confirm. The user may make further edits and adjustments to the filling result, such as modifying text, adjusting pictures, optimizing typesetting, etc. When the user confirms that there is no error, the system will generate the final web page code.
3.7 Exception handling and logging:
During the process of content filling, abnormal conditions may occur, such as content element missing, placeholder matching failure, typesetting disorder, etc. The system needs to handle these exceptions properly, give friendly hints or take default filling strategies. Meanwhile, detailed log information is recorded, so that problems can be conveniently tracked and debugged.
C6: filling placeholders of the target webpage templates corresponding to the intelligent rewritten and optimized various content elements to generate filled content;
C7: performing intelligent typesetting and format adjustment on the filled content to generate a corresponding initial webpage; the intelligent typesetting at least comprises self-adaptive layout and interactive effect addition; the formatting is used to reconcile the populated content with the design style of the target web page template.
(4) The intelligent layout optimization module of the webpage:
The webpage intelligent layout optimization module performs layout analysis on the initial webpage in a preset layout analysis mode to obtain an optimal layout scheme, and the specific process is shown as D1-D5.
D1: and carrying out screenshot on the initial webpage and carrying out image preprocessing on the screenshot.
Wherein, through the implementation of headless browser Puppeteer, the user is simulated to access the initial web page and generate a screenshot. The acquired screenshot images need to be subjected to image preprocessing, such as scaling to a fixed size, converting to a gray scale image, and the like, so as to facilitate subsequent feature extraction and analysis.
D2: and extracting the features of the visual elements from the screenshot after the image pretreatment by a feature extraction method to obtain each key visual element of the webpage.
The feature extraction method comprises edge detection, color analysis, texture analysis and the like.
Each key visual element includes web page HTML related elements, layout elements, text elements, image elements, video elements, and the like.
D3: and analyzing the spatial relationship among the key visual elements of the webpage to obtain spatial relationship information.
Wherein the spatial relationship between the key visual elements includes relative position, size, alignment, etc.
D4: constructing a layout optimized reinforcement learning model according to each key visual element and the spatial relation information; the layout optimized reinforcement learning model at least comprises a web page layout state representation, an action space, a reward function and a reinforcement learning algorithm.
D5: and obtaining an optimal layout scheme through the reinforcement learning model of layout optimization.
And continuously trying and iterating through a reinforcement learning model of layout optimization to find out an optimal layout scheme.
And performing intelligent layout analysis and optimization on the webpage filled with the content by using a computer vision and deep learning algorithm. Important visual elements and the spatial relationship between the important visual elements are identified through extracting and learning the characteristics of the webpage screenshot, then an optimal layout scheme is found through continuous try and iteration by using a reinforcement learning algorithm, and the size, the position, the distance and other attributes of the content blocks are dynamically adjusted, so that the visual presentation of the webpage is more friendly and attractive, and the specific details are as follows:
4.1 acquiring and preprocessing the webpage screenshot:
First, the web page online generation system captures an initial web page filled with content. May be implemented by headless browser Puppeteer, which simulates a user accessing an initial web page and generates a screenshot. The acquired screenshot images need to be subjected to image preprocessing, such as scaling to a fixed size, conversion to a gray scale image, and the like, so as to facilitate subsequent feature extraction and analysis.
4.2 Feature extraction of visual elements:
next, the web page online generation system analyzes the web page screenshot by using a feature extraction method in computer vision. The common feature extraction method comprises the following steps:
edge detection: and extracting edge information in the image by using a Canny algorithm, and highlighting the outline of the content block.
Color analysis: and carrying out statistics and clustering on the color distribution of the image, and identifying the theme colors and the color matching schemes.
Texture analysis: the texture features of the image are extracted and the SIFT algorithm captures the visual pattern of the content block.
Examples of codes are as follows:
import cv2
# reading webpage screenshot
image=cv2.imread('webpage_screenshot.png',
cv2.IMREAD_GRAYSCALE)
# Gaussian blur, noise reduction
blurred=cv2.GaussianBlur(image,(5,5),0)
# Canny edge detection
edges=cv2.Canny(blurred,50,150)
Results are displayed #
cv2.imshow('Edges',edges)
cv2.waitKey(0)
cv2.destroyAllWindows()
This is the import cv2 that extracts SIFT edge features
# Read image
image=cv2.imread('image.jpg')
gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
# Create SIFT feature extractor
sift=cv2.SIFT_create()
# Detection and extraction of SIFT feature
keypoints,descriptors=sift.detectAndCompute(gray,None)
# Drawing key point
image_with_keypoints=cv2.drawKeypoints(image,keypoints,None)
Results are displayed #
cv2.imshow('SIFT Keypoints',image_with_keypoints)
cv2.waitKey(0)
cv2.destroyAllWindows()
4.3 Spatial relationship analysis of visual elements:
through feature extraction, the webpage online generation system obtains key visual elements in the webpage. The spatial relationship between these key visual elements, such as relative position, size, alignment, etc., is analyzed. This can be achieved by the following steps:
coordinate and size extraction of elements: the bounding box coordinates and size information for each visual element are recorded.
Hierarchical structure analysis of elements: according to the nesting and overlapping relation of the elements, a tree-shaped hierarchical structure is constructed, and the subordinate relation among the elements is represented.
Alignment and distribution analysis of elements: the alignment between elements (left alignment, right alignment, centering, etc.) and the distribution (uniform distribution, aggregate distribution, etc.) are calculated.
4.4 Reinforcement learning model for layout optimization:
and constructing a reinforcement learning model by utilizing the key visual elements and the spatial relation information, and finding out an optimal layout scheme through continuous try and iteration. The reinforcement learning model mainly comprises a state representation, an action space, a reward function, a learning algorithm and the like:
state representation: the current web page layout is expressed as a state vector, and the state vector contains the position, the size, the alignment and other attributes of each visual element.
Action space: a set of layout adjustment actions are defined, such as moving elements, resizing elements, changing element alignment, etc.
Bonus function: a bonus function is designed to evaluate the quality of the layout. Factors such as layout regularity, aesthetics, legibility, etc. may be considered.
Learning algorithm: and adopting a reinforcement learning algorithm, and learning an optimal layout strategy by continuously interacting and feeding back with the environment by using Q-learning.
Code example:
4.5 application of layout optimization results:
the intelligent layout optimization module of the webpage determines corresponding modification parameters according to an optimal layout scheme, and dynamically adjusts Cascading Style Sheets (CSS) (such as position, size, height) and spacing (margin, padding) of content blocks in an initial webpage according to the modification parameters under the responsive layout to obtain a target webpage; wherein the responsive layout is determined according to the device type and screen size.
The web page online generation system can apply the optimal layout scheme obtained through the reinforcement learning model to the actual web page. This requires dynamic adjustment of CSS style, such as position, size, pitch (margin, padding), etc., of the content blocks in the web page. Meanwhile, responsive layout under different equipment and screen sizes is also considered, so that good visual effect of the webpage can be ensured under various environments.
(5) And a webpage personalized customization module:
When the webpage personalized customization module receives a webpage personalized request, the webpage personalized customization module determines personalized adjustment parameters corresponding to the webpage personalized request through a pre-constructed user portrait and intention understanding model; the personalized adjustment parameters at least comprise content recommendation, style and interaction mode, and the target webpage is dynamically adjusted according to the personalized adjustment parameters in the process of generating the target webpage.
The webpage online generation system builds a user portrait and intention understanding model by analyzing data such as historical behaviors, preferences, scenes and the like of the user. In the webpage generation process, content recommendation, style, interaction mode and the like of the webpage are dynamically adjusted, and personalized customized webpage experience is provided for different users.
The user can also carry out fine tuning and secondary creation on the webpage through a visual editing tool provided by the system, and personalized requirements are further met, specifically as follows:
5.1 user data collection and processing:
First, the web page online generation system needs to collect various data of the user, including browsing history, search records, and clicking actions. These data may be obtained by means of user login, cookie tracking, buried points, etc. The collected raw data needs to be subjected to preprocessing such as cleaning, de-duplication, formatting and the like so as to facilitate subsequent analysis and mining.
5.2 User portrayal construction:
And constructing a user image by the webpage online generation system by using a machine learning algorithm according to the collected user data. Common algorithms include:
5.2.1 clustering algorithm: and dividing similar users into different groups through a K-means algorithm, and finding out common characteristics of the users.
5.2.2 Association rule mining: and (3) discovering association rules among user behaviors through an Apriori algorithm, and predicting potential interests of the user.
5.2.3 Matrix decomposition: and decomposing the user-object matrix into a user feature matrix and an object feature matrix by utilizing SVD technology, so as to realize the extraction of implicit features.
Code example:
from sklearn.cluster import KMeans
user profile data
user_features=[
[0.2,0.8,0.3],
[0.1,0.9,0.4],
[0.7,0.2,0.6],
[0.8,0.3,0.5],
]
# Set number of clusters
num_clusters=2
# Create K-means cluster
kmeans=KMeans(n_clusters=num_clusters)
# Training clustering device
kmeans.fit(user_features)
# Obtain clustering results
cluster_labels=kmeans.labels_
# Print clustering result
for i,label in enumerate(cluster_labels):
Print (f "user { i } belongs to cluster { label }).
5.3 User intent understanding:
on the basis of constructing the user portrait, the webpage online generation system needs to further understand the real-time intention of the user. This can be achieved by:
5.3.1 query understanding: through natural language processing technology such as lexical analysis, grammar analysis, semantic understanding and the like, search queries of users are analyzed, and keywords and semantic information are extracted.
5.3.2 Scene recognition: and deducing the scene and task where the user is located, such as work, study, entertainment and the like according to the access time, place, equipment and other contextual information of the user.
5.3.3 Behavioral sequence analysis: the sequence of behavior of the user is modeled using a sequence model, such as HMM, LSTM, etc., to predict the user's next actions and long-term goals.
5.4 Personalized Web page Generation:
Based on the results of the user portraits and the intent understanding, the system dynamically generates personalized web page content and styles. Mainly comprises the following aspects:
5.4.1 content recommendation: and recommending the related articles, products, services and other contents according to the interests and the preferences of the user, and improving the participation degree and satisfaction degree of the user.
5.4.2 Style adaptation: according to aesthetic preference and use scene of the user, dynamically adjusting color collocation, layout style, interaction mode and the like of the webpage, and creating comfortable visual experience.
5.4.3 Interaction optimization: according to the behavior habit and task requirements of the user, the navigation structure, the functional layout, the operation flow and the like of the webpage are optimized, and the use efficiency and the convenience of the user are improved.
Code example:
from sklearn.metrics.pairwise import cosine_similarity
user-item scoring matrix
ratings=[
[4,3,0,5],
[0,0,3,0],
[2,4,0,1],
[0,3,0,2],
]
# Calculate similarity between users
user_similarity=cosine_similarity(ratings)
# Find similar users to target user
target_user=0
similar_users=user_similarity[target_user].argsort()[::-1][1:]
# Recommended article
recommended_items=set()
foruserin similar_users:
liked_items=set(i fori,rating in enumerate(ratings[user])ifrating>0)
recommended_items.update(liked_items)
# Filter out items that the target user has scored
recommended_items-=set(i fori,rating in enumerate(ratings[target_user])ifrating>0)
Print recommendation results #
Print (f "is an item recommended by the user { target_user:", recommended_items).
5.5 Visual editing and secondary creation:
Visual editing tools allow users to personalize and secondarily author generated web pages. The user can modify the content, layout, style and the like of the webpage in a dragging, configuration and other modes, so that finer granularity customization is realized.
(6) And the intelligent webpage testing and optimizing module is used for:
The webpage online generation system integrates an intelligent webpage quality test and optimization engine, and the webpage intelligent test and optimization module carries out multidimensional evaluation on the generated webpage, including page performance evaluation, user experience evaluation, search Engine Optimization (SEO) friendliness evaluation and the like. The engine adopts a machine learning algorithm, automatically discovers problems and optimization points in the webpage through comparison with a standard data set, and gives corresponding repair and optimization suggestions. Meanwhile, the system can track the actual access behavior data of the user, and iterate and optimize the webpage generation strategy continuously.
6.1 Evaluation index of web page quality:
First, a series of web page quality assessment metrics need to be defined, covering different dimensions and aspects. Common indicators include:
Page loading speed: and measuring the loading time of the webpage, such as the first rendering time, the first content drawing time and the like.
Response performance: and (5) evaluating the response speed and stability of the webpage under different equipment and network conditions.
Code quality: code normalization, semanticalization and maintainability of HTML, CSS, javaScript and the like of the webpage are checked.
User experience: and evaluating the user friendliness and usability of the layout, design, interaction and the like of the webpage.
SEO optimization: and analyzing optimization conditions of SEO elements such as keywords, titles, descriptions, links and the like of the webpage.
Collecting webpage quality data:
To evaluate web page quality, the system needs to collect various relevant data. Mainly comprises the following aspects:
performance index: through a browser automation tool, the performance indexes such as page loading, rendering, interaction and the like are acquired by simulating a user to access a webpage through Puppeteer.
Code index: the quality and normalization of the web page code is checked using static code analysis tools such as ESLint, JSHint.
Search engine data: and acquiring data such as ranking, recording, flow and the like of the web pages in the search engine, and evaluating the SEO optimization effect.
Code example:
6.2 machine learning model training:
The web page online generation system trains a machine learning model by utilizing the collected web page quality data, and is used for automatically evaluating and predicting the comprehensive quality score of the web page. To train the machine learning model, a standard web page quality dataset needs to be prepared. The data set should contain a large number of web page samples of different quality, and corresponding manual assessment scores. Each web page sample needs to extract various quality metrics, such as page loading speed, code quality, user experience, etc., which will be input features of the machine learning model.
Next, a suitable machine learning algorithm is selected to build the model. Common algorithms include regression models, classification models, and cluster models.
For regression models, such as linear regression and decision tree regression, the goal is to predict the overall quality score of a web page based on its various quality metrics. And training a regression model to fit the mapping relation by taking the webpage quality index as an input characteristic and the manual evaluation score as a target variable. By minimizing the difference between the predicted score and the actual score, the regression model is able to learn the laws of web page quality assessment.
For classification models, such as logistic regression and support vector machines, the goal is to divide web pages into different quality levels, such as excellent, good, general, poor, etc. Taking the webpage quality index as an input characteristic, taking the quality grade of manual evaluation as a target class, and training a classification model to learn the discrimination boundary of the webpage quality. By maximizing the probability of correct classification, the classification model can master the strategy of web page quality assessment.
For clustering models, such as K-means clustering models and Density-based clustering algorithms (Density-Based Spatial Clustering ofApplications withNoise, DBSCAN) clustering models, the goal is to automatically aggregate web pages of similar quality together, and discover common features of different quality levels. And taking the webpage quality index as an input characteristic, and dividing the webpage sample into a plurality of clusters by using a clustering algorithm. The quality of web pages in each cluster is similar, and the difference of web page quality among different clusters is larger. By analyzing the center points or representative samples of cada clusters, the typical characteristics and optimization directions of different quality classes can be summarized.
In the model training process, a cross-validation method is generally adopted to divide the data set into a plurality of subsets, which are respectively used as a training set and a validation set. Through multiple rounds of cross-validation, I can evaluate the generalization performance of the model and adjust the hyper-parameters of the model to obtain optimal performance.
After training is completed, the model is finally evaluated using the test set. The test set is a set of independent web page samples used to verify the predictive ability of the model on unknown data. The webpage quality indexes of the test set are input into the trained model, the quality scores or grades predicted by the model are compared with the actual manual evaluation results, and evaluation indexes such as average absolute errors, accuracy and the like are calculated to measure the performance of the model.
And finally, integrating the trained model into a webpage intelligent testing and optimizing module. When a new webpage is generated, the quality index of the webpage is extracted and input into the model for prediction. The model automatically gives the comprehensive quality score or grade of the webpage and gives corresponding optimization suggestions according to the prediction result. By continuously collecting user feedback and interaction data, I can periodically update and optimize the model to adapt to changing web page quality standards and user requirements.
6.3 Webpage problem diagnosis and optimization suggestion:
the web page online generation system can automatically diagnose various quality problems existing in the web page by using the trained machine learning model, and give out corresponding optimization suggestions. The diagnostic and advice process generally includes the following steps:
Problem identification: and inputting various indexes of the target webpage into a machine learning model, and predicting quality scores and grades of the target webpage. If the score is low or the ranking is poor, this indicates that the web page has quality problems.
Problem location: and positioning key indexes and page elements which cause quality problems through technologies such as feature importance analysis, abnormal point detection and the like.
Optimization advice: and according to the problem positioning result, combining a predefined optimization rule and an experience knowledge base, and generating targeted optimization suggestions, such as compressed pictures, combined CSS files, optimization keywords and the like.
Code example:
from sklearn.ensemble import RandomForestRegressor
webpage quality dataset
X_train=[
[0.8,0.9,0.7,0.6],
[0.6,0.7,0.8,0.7],
[0.9,0.8,0.9,0.8],
[0.5,0.6,0.7,0.6],
]
y_train=[0.8,0.7,0.9,0.6]
Training random forest regression model
model=RandomForestRegressor()
model.fit(X_train,y_train)
# Web page index to be evaluated
X_test=[[0.7,0.8,0.6,0.7]]
# Predictive web page quality score
y_pred=model.predict(X_test)
Print ('web page quality score:', y_pred [0 ])
Importance of # acquisition index
importances=model.feature_importances_
Print ('index importance:', importances).
6.4, Generating strategy iterative optimization by the webpage:
according to the results of the webpage quality evaluation and the user behavior analysis, the webpage online generation system needs to iterate and optimize the webpage generation strategy continuously. Mainly comprises the following aspects:
6.4.1 template optimization: and according to the quality evaluation result, improving the design and architecture of the webpage template, such as optimizing layout, simplifying structure, improving performance and the like.
6.4.2 Content optimization: and adjusting the content strategy of the webpage according to the interests and feedback of the user, such as adding topical subjects, optimizing keywords, improving documents and the like.
6.4.3 Interaction optimization: according to the user behavior data, the interactive design of the webpage is optimized, such as simplifying operation flow, providing personalized recommendation, improving navigation experience and the like.
6.4.4 Algorithm optimization: and continuously optimizing a machine learning algorithm and a model according to the accumulated quality data and user feedback, and improving the accuracy of webpage quality evaluation and generation.
Code example:
import tensorflow as tf
# definition model structure
model=tf.keras.Sequential([
tf.keras.layers.Dense(64,activation='relu',input_shape=(4,)),
tf.keras.layers.Dense(32,activation='relu'),
tf.keras.layers.Dense(1)
])
# Compiling model
model.compile(optimizer='adam',loss='mse',metrics=['mae'])
Training model #
history=model.fit(X_train,y_train,epochs=100,validation_split=0.2)
# Evaluation model
loss,mae=model.evaluate(X_test,y_test)
Print ('average absolute error:', mae)
# Fine tuning model
model.fit(X_train,y_train,epochs=50,validation_split=0.2)
Model after # saving optimization
model.save('optimized_model.h5')。
Through continuous webpage quality evaluation, user behavior analysis and generation strategy optimization, the system can continuously improve comprehensive quality and user experience of webpages, and can adapt to continuously changing user requirements and market trends.
(7) And the webpage publishing and intelligent popularization module:
after the target webpage is generated, the webpage publishing and intelligent popularization module hosts the target webpage through a preset Web server and optimizes the target webpage by a search engine.
The Web page online generation system adopts a high-performance Web server (Nginx) and the like to host the Web page, so that the access speed and the stability of the Web page are ensured. Meanwhile, the webpage online generation system can perform SEO optimization on the webpage, and the exposure of the webpage in a search engine is improved.
7.1Web server deployment:
A high-performance Web server needs to be selected to host the generated Web page. Commonly used web servers (APACHE HTTP SERVER, apache) that choose nminix, open source code, etc. Taking ng ginx as an example, the flow of web page deployment is described herein.
And uploading the generated webpage file (HTML, CSS, javaScript and the like) to a specified directory of the server.
And configuring a virtual host of Nginx, and designating a path and an access domain name of the webpage file.
And optimizing performance parameters of Nginx, such as concurrent connection number, cache strategy and the like, so as to improve the access speed and stability of the webpage.
Nginx configuration file:
7.2SEO optimization:
In order to improve the exposure of the target webpage in the search engine, the webpage publishing and intelligent popularization module needs to perform SEO optimization on the target webpage. The main optimization strategies include:
And optimizing meta information such as titles, descriptions, keywords and the like of the web pages to enable the meta information to accord with index rules of a search engine.
Optimizing the content structure of the web page, such as using semantic HTML tags, reasonable title hierarchy, etc., improves the readability and relevance of the content.
The link structure of the webpage is optimized, such as reasonable distribution of internal links, authority of external links and the like, and the link quality and weight of the webpage are improved.
The loading speed of the webpage is optimized, such as file compression, content DeliveryNetwork, CDN use and the like, so that the user experience and the search engine score are improved.
Code example:
7.3 Intelligent popularization engine:
After the web page is released, the web page needs to be accurately pushed to the target audience by adopting an intelligent popularization strategy. The main tasks of the intelligent popularization engine include:
And analyzing the content and the theme of the webpage, extracting keywords and labels, and determining the popularization direction and range.
Behavior data of the user, such as browsing history, search words, hobbies and interests, are collected and analyzed to construct a user portrait.
And selecting an optimal popularization channel, such as search engine advertisements, social media advertisements, content recommendation platforms and the like, and formulating a popularization strategy according to the characteristics and pricing models of different channels.
And the promotion effect, such as click rate, conversion rate and the like, is monitored in real time, the promotion strategy and bid are dynamically adjusted, and budget optimization and effect maximization are realized.
Code example:
through analysis and optimization of the intelligent popularization engine, accurate popularization of the webpage can be realized, and the popularization efficiency and the return on investment are improved.
According to the online system for converting PPT, word, excel, PDF and other common documents into the webpage automatically, non-technical staff can also quickly manufacture professional webpage, webpage manufacturing difficulty and labor cost are greatly reduced, and webpage generation efficiency is improved.
The innovation point of the scheme is that the information such as document content, structure, typesetting, style and the like is automatically extracted through intelligent analysis and conversion processing on the uploaded document, mapped and converted into corresponding HTML webpage codes, and user-defined editing of the webpage content is supported. The technical scheme of the scheme can be adapted to various common document formats to generate the target webpage with good compatibility and high reduction degree.
Technical problem that this scheme was solved:
the method solves the problems that the traditional webpage making requires professional front-end developers to write codes manually, the technical threshold is high, and the development efficiency is low;
According to the scheme, common documents such as Word, PPT, PDF and the like are automatically converted into the webpage through a file analysis and conversion technology, and manual coding is not needed, so that non-technical staff can quickly manufacture professional webpage, the technical threshold for webpage manufacture is greatly reduced, and the webpage development efficiency is improved;
the problem that the existing webpage making tool is limited in function and difficult to meet personalized webpage customization requirements is solved;
The scheme supports the uploading of documents in various formats by a user, extracts document contents and styles by an intelligent analysis engine, supports the online editing, modification and typesetting of webpage contents by the user, and meets the personalized webpage customization requirements in different scenes;
The problem of low manufacturing efficiency due to lack of an automatic and intelligent means in the webpage manufacturing process is solved;
the scheme adopts intelligent technologies such as file analysis, content extraction, format conversion and the like to realize automatic generation and conversion from the document to the webpage, manual intervention is not needed in the whole process, and the webpage manufacturing efficiency is greatly improved compared with that of the traditional mode.
The beneficial effects of this scheme are as follows:
Automated document conversion: according to the scheme, the document analysis module automatically extracts the structured data and style information of the document, so that the workload of manually converting the document is reduced, and the conversion efficiency is improved;
generating an intelligent webpage: through the webpage template matching module, the scheme can automatically recommend the most similar webpage template according to the content and structural characteristics of the document, so that the intelligent mapping from the document to the webpage is realized, and the threshold of webpage development is reduced;
Personalized template selection: the system presets a plurality of sets of webpage templates with different styles and layouts, and a user can select templates of the cardiology instrument to apply according to own preference, so that the personalized requirements of the user are met;
Promote the bandwagon effect: by automatically matching the optimal template, the scheme can generate the webpage with stronger visual attraction and more reasonable layout for the document content, thereby improving the display effect and user experience of the document content;
Development cost is saved: the traditional process of converting the document into the webpage requires the participation of professional webpage developers, and has higher cost. By means of automatic document analysis and template matching, requirements on professional skills are reduced, and development cost is saved;
flexible and scalable: the template library of the scheme can be continuously expanded and updated, and new template design and layout styles are supported to be added. Meanwhile, the document analysis and template matching algorithm can be continuously optimized to adapt to different types of documents and user requirements;
Information propagation efficiency is improved: through automatically converting the document into the webpage, the scheme can accelerate the propagation speed of the document content, so that more people can conveniently access and view the document information, and the information propagation efficiency is improved.
In the embodiment of the application, intelligent analysis and conversion processing are carried out on the document to be converted into the webpage, key information such as document content, structure, typesetting, style and the like is automatically extracted, the key information is converted into document content characteristics, and the optimal webpage template with highest similarity score is automatically recommended through a pre-trained webpage template recommendation engine and the document content characteristics, so that intelligent mapping from the document to the webpage is realized, and non-technicians can also quickly manufacture professional webpage, thereby reducing the threshold of webpage development, webpage manufacturing difficulty and labor cost, and improving webpage generation efficiency.
Referring to fig. 2, an online webpage generating method based on document processing disclosed in the embodiment of the present application is applied to the online webpage generating system of fig. 1 in the above embodiment, and the online webpage generating method based on document processing mainly includes the following steps:
s201: acquiring a document to be converted;
s202: and analyzing the document to be converted through a pre-trained document analysis deep learning model to obtain key information.
The document analysis deep learning model at least comprises a named entity recognition model, a text abstract model, a semantic role labeling model and an emotion analysis model.
And analyzing the document to be converted through a pre-trained document analysis deep learning model, wherein the process of obtaining key information is shown as E1-E7.
E1: preprocessing a document to be converted to obtain a document to be converted in a preset format; the pretreatment at least comprises cleaning, word segmentation and word deactivation.
E2: and identifying key entities of the document to be converted through the named entity identification model.
E3: keywords (reflecting the subject matter and core content of the text) are extracted from the document to be converted by using a preset algorithm (TF-IDF algorithm).
E4: and generating a summary of the document to be converted through a text summary model.
E5: semantic roles (e.g., agent, events, time, places, etc., understand the semantic structure of the text) of the document to be converted are identified by the semantic role annotation model.
E6: the emotion tendencies (such as positive, negative, neutral, and the like) of the document to be converted are determined through an emotion analysis model, and the opinion and attitude of the author are known.
E7: and extracting key information from the key entities, the key words, the abstract, the semantic roles and the emotion tendencies.
The execution process and execution principle of E1-E7 are identical to those of A1-A7 of fig. 1 in the above embodiment, and reference is made thereto, and no further description is given here.
S203: and converting the key information to obtain the document content characteristics.
S204: obtaining a target webpage template through a pre-trained webpage template recommending engine and document content characteristics; the target webpage template is the optimal webpage template with the highest similarity score between the document content characteristics and the template characteristics.
Specifically, a process of obtaining a target webpage template through a pre-trained webpage template recommendation engine and document content characteristics is shown as F1-F7:
F1: the document content features are converted into normalized feature vector representations.
F2: and carrying out similarity calculation on the standardized feature vector representation and the template feature vectors in the template feature library to obtain the similarity of the current document.
F3: and carrying out personalized recommendation of the template through a collaborative filtering algorithm, the similarity of the user history selection template and the current document to obtain a collaborative filtering result.
F4: and carrying out semantic matching on the document content characteristics and the template characteristic vectors through a neural network model convolutional neural network CNN in deep learning to obtain a deep learning result.
And F5: and optimizing a recommendation algorithm of a web page template recommendation engine through reinforcement learning technology and feedback of a preset recommendation template to obtain reinforcement learning results.
F6: and obtaining a template recommendation list according to the collaborative filtering result, the deep learning result and the reinforcement learning result.
F7: and selecting the target webpage template with the highest similarity score between the document content characteristics and the template characteristics from the template recommendation list.
The execution process and execution principle of F1-F7 are identical to those of B1-B7 of fig. 1 in the above embodiment, and reference is made thereto, and details thereof will not be repeated.
S205: and correspondingly filling the document content data in the key information into a target webpage template, and generating a corresponding initial webpage.
And particularly, the process of correspondingly filling the document content data in the key information into the target webpage template and generating a corresponding initial webpage is shown as G1-G7.
G1: traversing the HTML code of the target web page template.
And G2: various placeholders (title placeholders, paragraph placeholders, etc.) of the target web page template are identified by the hypertext markup language code.
And G3: various types of content elements of the document content data in the key information are determined.
And G4: matching various content elements with various placeholders through preset matching rules; the preset matching rule is a rule for searching for a corresponding content element according to the type of the placeholder.
And G5: intelligent rewriting and optimizing of texts corresponding to the matched various content elements; intelligent rewriting is used for enabling texts to be concise and smooth; optimizing is used for improving the efficiency of browsing and information acquisition of web pages.
G6: and correspondingly filling placeholders of the target webpage templates with the intelligent rewritten and optimized various content elements to generate filled content.
And G7: performing intelligent typesetting and format adjustment on the filled content to generate a corresponding initial webpage; the intelligent typesetting at least comprises self-adaptive layout and interactive effect addition; the formatting is used to reconcile the populated content with the design style of the target web page template.
The execution process and execution principle of G1-G7 are identical to those of C1-C7 of fig. 1 in the above embodiment, and reference is made thereto, and no further description is given here.
S206: carrying out layout analysis on the initial webpage in a preset layout analysis mode to obtain an optimal layout scheme; the preset layout analysis mode is a mode for analyzing the characteristics of the visual elements and the spatial relationship of the visual elements.
And carrying out layout analysis on the initial webpage in a preset layout analysis mode, and obtaining an optimal layout scheme, wherein the process is shown as H1-H5.
H1: and carrying out screenshot on the initial webpage and carrying out image preprocessing on the screenshot.
H2: and extracting the features of the visual elements from the screenshot after the image pretreatment by a feature extraction method to obtain each key visual element of the webpage.
And H3: and analyzing the spatial relationship among the key visual elements of the webpage to obtain spatial relationship information.
H4: constructing a layout optimized reinforcement learning model according to each key visual element and the spatial relation information; the layout optimized reinforcement learning model at least comprises a web page layout state representation, an action space, a reward function and a reinforcement learning algorithm.
And H5: and obtaining an optimal layout scheme through the reinforcement learning model of layout optimization.
The execution process and execution principle of H1-H5 are identical to those of the web page intelligent layout optimization module of fig. 1 in the above embodiment, and reference is made thereto, and details thereof are not repeated here.
S207: and dynamically adjusting the initial webpage according to the optimal layout scheme to obtain the target webpage.
In S207, according to the optimal layout scheme, determining a corresponding modification parameter, and dynamically adjusting a cascading style sheet of the content block in the initial webpage according to the modification parameter under the responsive layout to obtain a target webpage; wherein the responsive layout is determined according to the device type and screen size.
When a webpage personalized request is received, determining personalized adjustment parameters corresponding to the webpage personalized request through a pre-constructed user portrait and intention understanding model; the personalized adjustment parameters at least comprise content recommendation, style and interaction mode, and the target webpage is dynamically adjusted according to the personalized adjustment parameters in the process of generating the target webpage.
Performing multidimensional evaluation on the target webpage through an intelligent webpage quality test and optimization engine; the multi-dimensional assessments include at least a page performance assessment, a user experience assessment, and an SEO friendliness assessment.
And hosting the target webpage through a preset Web server, and optimizing a search engine for the target webpage.
The execution process and execution principle of S201-S207 are identical to those of the web page online generation system of fig. 1 in the above embodiment, and reference is made thereto, and details thereof will not be repeated.
In the embodiment of the application, intelligent analysis and conversion processing are carried out on the document to be converted into the webpage, key information such as document content, structure, typesetting, style and the like is automatically extracted, the key information is converted into document content characteristics, and the optimal webpage template with highest similarity score is automatically recommended through a pre-trained webpage template recommendation engine and the document content characteristics, so that intelligent mapping from the document to the webpage is realized, and non-technicians can also quickly manufacture professional webpage, thereby reducing the threshold of webpage development, webpage manufacturing difficulty and labor cost, and improving webpage generation efficiency.
Based on the above embodiment, fig. 2 discloses a method for generating an online webpage based on document processing, and the embodiment of the application correspondingly discloses an online webpage generating device based on document processing, as shown in fig. 3, the online webpage generating device based on document processing includes:
a first obtaining unit 301, configured to obtain a document to be converted;
The parsing unit 302 is configured to parse the document to be converted through a pre-trained document parsing deep learning model to obtain key information;
A conversion unit 303, configured to convert the key information to obtain a document content feature;
a second obtaining unit 304, configured to obtain a target web page template through a pre-trained web page template recommendation engine and document content features; the target webpage template is the optimal webpage template with the highest similarity score between the document content characteristics and the template characteristics;
a filling generation unit 305, configured to correspondingly fill document content data in the key information into a target web page template, and generate a corresponding initial web page;
the analysis unit 306 is configured to perform layout analysis on the initial webpage in a preset layout analysis manner, so as to obtain an optimal layout scheme; the preset layout analysis mode is a mode for analyzing the characteristics of the visual elements and the spatial relationship of the visual elements;
the first adjusting unit 307 is configured to dynamically adjust the initial web page according to the optimal layout scheme, so as to obtain a target web page.
Further, the parsing unit 302 includes:
The preprocessing module is used for preprocessing the document to be converted to obtain the document to be converted in a preset format; the pretreatment at least comprises cleaning, word segmentation and stop word removal;
the first identification module is used for identifying key entities of the document to be converted through the named entity identification model;
the first extraction module is used for extracting keywords from the document to be converted by using a preset algorithm;
The generation module is used for generating a summary of the document to be converted through the text summary model;
The second identification module is used for identifying the semantic roles of the document to be converted through the semantic role annotation model;
The first determining module is used for determining the emotion tendencies of the document to be converted through an emotion analysis model;
and the second extraction module is used for extracting key information from the key entity, the key word, the abstract, the semantic role and the emotion tendency.
Further, the second obtaining unit 304 includes:
The conversion module is used for converting the document content characteristics into standardized characteristic vector representations;
The computing module is used for carrying out similarity computation on the standardized feature vector representation and the template feature vectors in the template feature library to obtain the similarity of the current document;
The recommendation module is used for carrying out personalized recommendation of the template through a collaborative filtering algorithm, similarity of the user history selection template and the current document to obtain a collaborative filtering result;
The first matching module is used for carrying out semantic matching on the document content characteristics and the template characteristic vectors through a neural network model convolutional neural network in deep learning to obtain a deep learning result;
The optimization module is used for optimizing a recommendation algorithm of the webpage template recommendation engine through reinforcement learning technology and feedback of a preset recommendation template to obtain reinforcement learning results;
the third acquisition module is used for acquiring a template recommendation list according to the collaborative filtering result, the deep learning result and the reinforcement learning result;
And the selecting module is used for selecting the target webpage template with the highest similarity score between the document content characteristics and the template characteristics from the template recommendation list.
Further, the filling generation unit includes:
The traversal module is used for traversing the hypertext markup language code of the target webpage template;
the third recognition module is used for recognizing various placeholders of the target webpage template through the hypertext markup language code;
The first determining module is used for determining various content elements of the document content data in the key information;
the second matching module is used for matching various content elements with various placeholders through preset matching rules; the preset matching rule is a rule for searching for corresponding content elements according to the type of the placeholder;
the rewrite optimization module is used for intelligently rewriting and optimizing texts corresponding to the matched various content elements; intelligent rewriting is used for enabling texts to be concise and smooth; optimizing the efficiency for improving the browsing and information acquisition of the web pages;
the filling module is used for intelligently rewriting placeholders corresponding to the optimized various content elements and filling the placeholders into the target webpage template to generate filled content;
The generating module is used for intelligently typesetting and format adjusting the filled content to generate a corresponding initial webpage; the intelligent typesetting at least comprises self-adaptive layout and interactive effect addition; the formatting is used to reconcile the populated content with the design style of the target web page template.
Further, the analysis unit 306 includes:
the screenshot processing module is used for performing screenshot on the initial webpage and performing image preprocessing on the screenshot;
The feature extraction module is used for extracting the features of the visual elements from the screenshot after the image pretreatment by a feature extraction method to obtain each key visual element of the webpage;
The analysis module is used for analyzing the spatial relationship among the key visual elements of the webpage to obtain spatial relationship information;
the construction module is used for constructing a layout optimized reinforcement learning model according to each key visual element and the spatial relation information; the layout optimized reinforcement learning model at least comprises a webpage layout state representation, an action space, a reward function and a reinforcement learning algorithm;
and the fourth acquisition module is used for obtaining an optimal layout scheme through the reinforcement learning model of layout optimization.
Further, the first adjusting unit 307 includes:
The second determining module is used for determining corresponding modification parameters according to the optimal layout scheme;
the adjustment module is used for dynamically adjusting the cascading style sheet of the content blocks in the initial webpage according to the modification parameters under the responsive layout to obtain the target webpage; wherein the responsive layout is determined according to the device type and screen size.
Further, the online webpage generating device based on document processing further comprises:
The determining unit is used for determining personalized adjustment parameters corresponding to the webpage personalized request through a pre-constructed user portrait and intention understanding model when the webpage personalized request is received; the personalized adjustment parameters at least comprise content recommendation, style and interaction modes;
And the second adjusting unit is used for dynamically adjusting the target webpage according to the personalized adjusting parameters in the process of generating the target webpage.
Further, the online webpage generating device based on document processing further comprises:
The evaluation unit is used for performing multidimensional evaluation on the target webpage through the intelligent webpage quality test and optimization engine; the multi-dimensional assessments include at least a page performance assessment, a user experience assessment, and a search engine optimization friendliness assessment.
Further, the online webpage generating device based on document processing further comprises:
The hosting optimization unit is used for hosting the target webpage through a preset Web server and optimizing the search engine for the target webpage.
In the embodiment of the application, intelligent analysis and conversion processing are carried out on the document to be converted into the webpage, key information such as document content, structure, typesetting, style and the like is automatically extracted, the key information is converted into document content characteristics, and the optimal webpage template with highest similarity score is automatically recommended through a pre-trained webpage template recommendation engine and the document content characteristics, so that intelligent mapping from the document to the webpage is realized, and non-technicians can also quickly manufacture professional webpage, thereby reducing the threshold of webpage development, webpage manufacturing difficulty and labor cost, and improving webpage generation efficiency.
The embodiment of the application also provides a storage medium, which comprises stored instructions, wherein the equipment where the storage medium is located is controlled to execute the online webpage generating method based on the document processing when the instructions run.
The embodiment of the application also provides an electronic device, the structure of which is shown in fig. 4, specifically including a memory 401 and one or more instructions 402, where the one or more instructions 402 are stored in the memory 401, and configured to be executed by the one or more processors 403 to perform the above-mentioned online webpage generating method based on document processing.
For the foregoing method embodiments, for simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders or concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For system-like embodiments, the description is relatively simple as it is substantially similar to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The steps in the method of the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.