Disclosure of Invention
The invention provides a spine recognition method which is low in cost, high in speed, high in precision and automatic, and solves the problem of book recognition on a library shelf and similar scenes.
Aiming at the defects of the prior art, the invention provides a book identification method based on spine visual information, which comprises the following steps:
step 1, obtaining book spine pictures marked with spine segmentation as a training set, training a deep convolution neural network model for segmenting the spine through the training set to obtain a spine segmentation model, and performing example segmentation on the collected book pictures on a shelf by using the spine segmentation model to obtain a plurality of spine pictures;
step 2, marking book categories for each spine picture to construct a spine classification data set, training a deep convolutional neural network model for spine classification through the spine classification data set to obtain a spine feature extraction model, extracting spine visual features of each book in a book database by using the spine feature extraction model, and integrating the spine visual features to construct a spine visual database;
and 3, inputting the to-be-recognized spine picture containing a plurality of spines into a spine segmentation model for instance segmentation, inputting the segmentation result into the spine feature extraction model to obtain visual feature vectors of the spines in the to-be-recognized spine picture, and matching the visual feature vectors with a database to recognize book categories of the spines in the to-be-recognized spine picture.
The book identification method based on the spine visual information comprises the step 1 of data set construction, wherein the step 1 comprises the steps of carrying out multi-angle shooting on books on a shelf by using picture acquisition equipment, and determining four coordinate points (x) in each spine area in a shooting resultN,yN)i,N∈[1,4]Form a closed quadrangle biPut it in a frameAnd selecting the marked book spine for segmentation.
The book identification method based on the spine visual information, wherein the step 2 comprises a book category labeling step, all spine areas B in the spine pictures of the books are obtainediObtaining a spine region BiMinimum circumscribed rectangle R ofiFour vertices (X)N,YN)i,N∈[1,4]And RiAngle of inclination theta of long sideiPerforming affine transformation on the original image by rotation θiThen according to (X)N,YN)i,N∈[1,4]Cutting to obtain regular spine picture BEiManually aligning book spine picture BEiLabeling category labels, wherein spine pictures of the same book have the same label.
The book identification method based on the spine visual information, wherein the construction method of the deep convolutional neural network model for spine classification in the step 2 comprises the following steps: construction of a multi-layered deep convolutional neural network as a feature extraction network m using residual modules2Feature extraction network m2Adding a fully connected classification layer classifier using an additive angle interval loss function at the tail end to obtain the structure of the deep convolutional neural network model for spine classification;
the step 2 includes training a model M according to a paradigm of a classification task using the spine classification dataset2=m2+ classifier: inputting a spine picture which is zoomed into a fixed size, training a label, M, to which the output spine picture belongs2Extracting the features in the model into a network m after training2Output feature map FiAs a visual feature vector for the spine.
The book identification method based on the spine visual information, wherein the step 3 comprises the step of sending the spine picture to BE identified into the spine segmentation model for processing to obtain the spine pictures BE of all books in the spine picture to BE identifiedi(ii) a In the identification process, two spine visual characterization vectors F are measured by using cosine similaritya=[a1,a2,…,a512]And Fb=[b1,b2,…,b512]Phase of (A) betweenDegree of similarity; spine feature extraction model m2Calculating each spine picture BEiVisual characterization of (F)iAnd performing nearest neighbor search on the data and data in the spine vision database to obtain a plurality of spine category id information with highest similarity with the target spine picture in the spine vision database, wherein the category id information with the highest similarity is used as a final identification result.
The invention also provides a book identification system based on the spine visual information, which comprises the following steps:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring book spine pictures marked with book spine segmentation as a training set, training a deep convolution neural network model for segmenting book spines through the training set to obtain a book spine segmentation model, and using the book spine segmentation model to perform example segmentation on collected book pictures on a shelf to obtain a plurality of book spine pictures;
the second training module is used for marking book categories for each spine picture, constructing a spine classification data set, training a deep convolutional neural network model for spine classification through the spine classification data set to obtain a spine feature extraction model, extracting the spine visual features of each book in a book database by using the spine feature extraction model, and integrating the spine visual features to construct a spine visual database;
and the identification module is used for inputting the to-be-identified book spine picture containing a plurality of book spines into the book spine segmentation model for instance segmentation, inputting the segmentation result into the book spine feature extraction model to obtain the visual feature vector of each book spine in the to-be-identified book spine picture, and matching the visual feature vector with the database to identify the book category of each book spine in the to-be-identified book spine picture.
The book identification system based on the spine visual information, wherein the first training module comprises: using a picture acquisition device to shoot books on the shelf from multiple angles, and determining four coordinate points (x) in each spine area in the shooting resultN,yN)i,N∈[1,4]Form a closed quadrangle biAnd (5) selecting the book spine to mark the book spine segmentation.
Based on visual information of the spineThe book recognition system, wherein the second training module comprises: obtaining all book spine areas B in book spine picturesiObtaining a spine region BiMinimum circumscribed rectangle R ofiFour vertices (X)N,YN)i,N∈[1,4]And RiAngle of inclination theta of long sideiPerforming affine transformation on the original image by rotation θiThen according to (X)N,YN)i,N∈[1,4]Cutting to obtain regular spine picture BEiManually aligning book spine picture BEiLabeling category labels, wherein spine pictures of the same book have the same label.
The book identification system based on the spine visual information, wherein the construction process of the deep convolutional neural network model for spine classification in the second training module comprises the following steps: construction of a multi-layered deep convolutional neural network as a feature extraction network m using residual modules2Feature extraction network m2Adding a fully connected classification layer classifier using an additive angle interval loss function at the tail end to obtain the structure of the deep convolutional neural network model for spine classification;
the second training module includes: training a model M according to a paradigm of classification tasks using the spine classification dataset2=m2+ classifier: inputting a spine picture which is zoomed into a fixed size, training a label, M, to which the output spine picture belongs2Extracting the features in the model into a network m after training2Output feature map FiAs a visual feature vector for the spine.
The book identification system based on the spine visual information comprises a spine image BE module, a spine segmentation module and a spine image BE module, wherein the spine image BE module is used for sending the spine image to BE identified into the spine segmentation module for processing to obtain the spine images BE of all books in the spine image to BE identifiedi(ii) a Measuring the similarity between two spine visual representation vectors by using cosine similarity in the identification process; spine feature extraction model m2Calculating each spine picture BEiVisual characterization of (F)iThe data in the ridge vision database and the data in the ridge vision database are searched for the nearest neighbor to obtain the target book in the ridge vision databaseAnd a plurality of spine category id information with highest spine picture similarity, wherein the category id information with highest similarity serves as a final identification result.
According to the scheme, the invention has the advantages that:
the book identification is a core step of most book management work, the technology of the application can automate the step under the conditions of low cost and high precision, thereby greatly reducing the manpower and finally achieving the purpose of replacing manual book arrangement by a machine. The method identifies the spine pictures of the book based on the deep learning algorithm, does not need to configure complicated hardware facilities, and ensures low cost; all visual information of the spine target is utilized, the method is not limited by a dictionary set on which a character recognition method depends, newly added books are supported in the collection of the books, and the method has higher accuracy rate and better robustness and expandability; according to different application requirements, the spine pictures of a single spine or a series of books on the shelf can be identified individually or in batches, and the high efficiency of book identification is ensured.
Detailed Description
Aiming at the book identification problem on a library shelf or other scenes, spine pictures are identified so as to determine the category of the spine pictures. The method mainly comprises the following steps: 1) collecting book spine pictures of books on a library shelf, and manually marking the pictures to construct a spine segmentation and spine classification data set; 2) constructing a convolutional neural network for extracting the depth features of the spine image, and training by using training data to obtain a feature extraction model; 3) in the testing process, a picture of one side of the spine of a book on the shelf can be shot, the spine is subjected to instance segmentation, then a trained model is utilized to obtain visual feature vectors corresponding to the spine picture, and then the spine picture is matched with a library database to identify the class of the book corresponding to the spine.
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below. In order to achieve the above object, the present invention provides a spine recognition method based on deep convolutional neural network as shown in fig. 1, including the following steps:
1. and training a spine segmentation model. First, in the real environment of a library, a large number of books on shelves are collected. Then, manually marking the partially collected book pictures on the rack to construct a spine segmentation data set; as an example segmentation task, a deep convolutional neural network model is designed that implements the segmentation of the spine, and the spine segmentation model is trained end-to-end using the spine segmentation dataset.
2. And (5) training a spine classification model. And carrying out example segmentation on the collected book pictures on the rack by using the trained spine segmentation model. Manually marking the class id of the segmented spine picture, and constructing a spine classification data set; and designing a deep convolution neural model for realizing spine classification. As a classification task, the model is trained end-to-end using the spine classification dataset, and a spine feature extraction network is derived from the model.
3. And (5) spine recognition. Firstly, calculating the spine visual representation of each book in a library database by using a trained spine feature extraction model, and adding and storing the spine visual representation in the database; when books are put on the identification shelf, pictures are taken on one side of the spines of a row of target books, the spine segmentation model automatically segments all spine regions, the spine feature extraction model calculates the visual representation of each spine region, and finally, the visual representation of the spines is used for performing nearest neighbor search in entries of a library database so as to determine library information corresponding to the target spines.
The invention is a software algorithm solution of spine classification and book identification problems, does not need to install and configure a complex hardware system, and simultaneously replaces manpower in key steps of book identification, thereby greatly reducing the manpower cost. In the identification process, all visual characteristics of the book spine region are utilized for the target picture, and only the character information is utilized, so that the method can identify the book spine with any language and artistic design and better resist the influence of factors such as ambient light change, book abrasion and the like. The characteristic vector matching is adopted to determine the recognition result, so that the dependency of a character recognition method on a dictionary set is eliminated, and the method can conveniently support newly-added library books.
Training of spine instance segmentation model
1) And constructing a spine segmentation data set. In a real library scene, the books on the shelf are shot by using RGB picture acquisition equipment. In order to obtain different pictures of the same spine after division, each book on the bookshelf is photographed from three different angles (as shown in fig. 2). Most of the book is in the shooting range as far as possible on the premise of ensuring the picture to be clear. In this embodiment, the number of books collected on the shelf is about 300, and the original picture element size is 1080 × 1920. Manually labeling a book picture (90 sheets) on a shelf, and determining four coordinate points (x) for each spine region in the pictureN,yN)i,N∈[1,4]Form a closed quadrangle biIt is boxed (as in fig. 3) to construct the spine segmentation data set. 80% of these were used as training data sets and 20% as test data sets.
Fig. 3 manually labels the spine region (white quadrangle is the labeled box).
And training a spine segmentation model. An example segmentation task in the field of computer vision is to not only detect the position of an object from a picture, but also segment the object from the background at the pixel level. The spine segmentation task can be realized by adopting a very mature instance segmentation model (such as a Mask R-CNN framework). Training a spine segmentation model end-to-end using a spine segmentation dataset: inputting the original images of the books on the shelf and the corresponding book spine frame label information, training, segmenting and outputting all the book spine areas (as shown in figure 4).
3) Other possible embodiments. In this step, the pictures of books on the shelf may be collected from different numbers of viewing angles when the same book shelf is photographed, possibly in an archive or other similar scenes; for extracting the spine region in the book picture on the shelf, the spine instance segmentation model may also be implemented according to other architectures, such as polarmmask, SOLO, BlendMask, and the like.
Training of spine classification models
1) Acquiring a spine picture and constructing a spine classification data set. Model M for completing spine segmentation1After the training, the collected pictures of all books on the shelf are subjected to example segmentation to obtain all spine areas B in the picturesi. Because the output of the model is the spine region B obtained by segmentationiIs an irregular area composed of pixel points classified as books in the picture, and B is obtained by calculationiMinimum circumscribed rectangle R ofiFour vertices (X)N,YN)i,N∈[1,4]And RiAngle of inclination theta of long sideiPerforming affine transformation on the original image by rotation θiThen according to (X)N,YN)i,N∈[1,4]Cutting to obtain regular spine picture BEi(see fig. 5). Manually labeling the spine pictures with category labels to ensure that the spine pictures of the same book have the same labels.
2) And extracting the visual representation of the spine picture. Constructing 18-layer deep convolutional neural network as feature extraction network m by using residual error module2The end adds a full-connectivity classification layer classifier (see fig. 6) using an additive angular interval loss function (see equation 1). Training model M according to classification task paradigm by using spine classification dataset2=m2+ classifier: the input is scaled to a fixed size (800 x 80) spine picture, and the correct label (i.e., class id) to which the output spine picture belongs is trained. M2After training is completed, m in the model2Output feature map FiAs a visual representation of the spine.
Wherein N is the number of samples in the mini-batch, s and m are hyper-parameters of the method, yiIs a specific category, n is the number of categories, and theta is the included angle between the weight and the feature vector in the model calculation process.
3) Other possible embodiments. In this step, the spine classification model may be composed of more layers of residual modules, or may be constructed by using other classical feature extraction networks, such as VGG, inclusion, or other self-designed deep convolution networks; the dimensions of the feature vectors ultimately taken for a single spine picture may vary.
Book identification
1) By m2The feature extraction network calculates visual representations of all the spines in the library, in this example, the visual representation F of each bookiIs a 512-dimensional vector. All vectors are stored in a single file Dict and saved in a library database for one-time reading in during retrieval.
2) To identify the class id of a row of target books, a picture is first taken on the spine side and fed into the spine segmentation model M1Middle processing to obtain spine pictures BE of all books in the picturesi. In the identification process, cosine similarity (as formula 2) is used for measuring two spine visual characterization vectors Fa=[a1,a2,…,a512]And Fb=[b1,b2,…,b512]Degree of similarity therebetween, FaFor the visual characterization vector of the spine in the spine picture to be recognized, FbAnd the visual characterization vector of the book spine in the book spine visual database is obtained. Spine feature extraction model m2Calculating each spine picture BEiVisual characterization of (F)iAnd performing nearest neighbor search on the image data and the Dict in the library database to obtain 5 spine (top5) type id information with highest similarity with the target spine image in the database, wherein the id with the highest similarity is used as a final identification result.
3) Other possible embodiments. Other loss functions may be used in training the spine classification network; when the characteristic extraction network is used for calculating the library database, a file may be stored for the visual representation vector of each book, and the files are read in and matched circularly during retrieval; when feature vector nearest neighbor search is performed, other criteria may be used to evaluate the similarity between vectors, such as euclidean distance or other distance measurement methods.
In this embodiment, the simulation constructs a target database probe containing 5580 spine pictures to be recognized and a test database garley containing 3700 collected spine pictures. And traversing the spine picture in the probe, performing nearest neighbor search with the visual representation Dict in the galery, and taking the image with the maximum similarity as a final class id identification result. Through statistical analysis, the book category id identification accuracy rate reaches 99.32%. In the matching error example, most of books in the same series are too similar in spine, and considering that the shelf-loading positions of books in the same series are generally in the same area, the bookshelf position judgment accuracy can reach 99.93% for the shelf-loading and unloading requirements of books.
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a book identification system based on the spine visual information, which comprises the following steps:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring book spine pictures marked with book spine segmentation as a training set, training a deep convolution neural network model for segmenting book spines through the training set to obtain a book spine segmentation model, and using the book spine segmentation model to perform example segmentation on collected book pictures on a shelf to obtain a plurality of book spine pictures;
the second training module is used for marking book categories for each spine picture, constructing a spine classification data set, training a deep convolutional neural network model for spine classification through the spine classification data set to obtain a spine feature extraction model, extracting the spine visual features of each book in a book database by using the spine feature extraction model, and integrating the spine visual features to construct a spine visual database;
and the identification module is used for inputting the to-be-identified book spine picture containing a plurality of book spines into the book spine segmentation model for instance segmentation, inputting the segmentation result into the book spine feature extraction model to obtain the visual feature vector of each book spine in the to-be-identified book spine picture, and matching the visual feature vector with the database to identify the book category of each book spine in the to-be-identified book spine picture.
The book identification system based on the spine visual information, wherein the first training module comprises: using a picture acquisition device to shoot books on the shelf from multiple angles, and determining four coordinate points (x) in each spine area in the shooting resultN,yN)i,N∈[1,4]Form a closed quadrangle biAnd (5) selecting the book spine to mark the book spine segmentation.
The book identification system based on the spine visual information, wherein the second training module comprises: obtaining all book spine areas B in book spine picturesiObtaining a spine region BiMinimum circumscribed rectangle R ofiFour vertices (X)N,YN)i,N∈[1,4]And RiAngle of inclination theta of long sideiPerforming affine transformation on the original image by rotation θiThen according to (X)N,YN)i,N∈[1,4]Cutting to obtain regular spine picture BEiManually aligning book spine picture BEiLabeling category labels, wherein spine pictures of the same book have the same label.
The book identification system based on the spine visual information, wherein the construction process of the deep convolutional neural network model for spine classification in the second training module comprises the following steps: construction of a multi-layered deep convolutional neural network as a feature extraction network m using residual modules2Feature extraction network m2Adding a full concatenation at the end using an additive angular interval penalty functionConnecting a classification layer classifier to obtain the structure of the deep convolutional neural network model for spine classification;
the second training module includes: training a model M according to a paradigm of classification tasks using the spine classification dataset2=m2+ classifier: inputting a spine picture which is zoomed into a fixed size, training a label, M, to which the output spine picture belongs2Extracting the features in the model into a network m after training2Output feature map FiAs a visual feature vector for the spine.
The book identification system based on the spine visual information comprises a spine image BE module, a spine segmentation module and a spine image BE module, wherein the spine image BE module is used for sending the spine image to BE identified into the spine segmentation module for processing to obtain the spine images BE of all books in the spine image to BE identifiedi(ii) a Measuring the similarity between two spine visual representation vectors by using cosine similarity in the identification process; spine feature extraction model m2Calculating each spine picture BEiVisual characterization of (F)iAnd performing nearest neighbor search on the data and data in the spine vision database to obtain a plurality of spine category id information with highest similarity with the target spine picture in the spine vision database, wherein the category id information with the highest similarity is used as a final identification result.
The specific scenarios of the invention can be as follows:
1. when a reader borrows a specific book, the reader searches a target book among a plurality of book lattices of a bookshelf even if the position of the bookshelf is retrieved. The method and the device can help readers to quickly identify the target book in complicated book lattices.
2. After the reader returns the books, the books need to be placed for the next borrowing by the reader. When the book is placed on the shelf, and the shelf is placed on the shelf. The bookshelf position of all books can be once shot, discerned, directly exported to books in a row to this application.
3. Since a reader may misplace a book or for other reasons after reading the book, the reader needs to check whether the book is in the correct bookshelf position during the library routine inspection. This work load is more huge, and it is almost impossible for the people to do, and this application can realize quick accurate books inspection.
4. With this application algorithm deployment on taking the mobile robot platform of arm, can realize the unmanned of the full flow of librarray management, from borrowing to going back the book, from the inspection to the arrangement, this application technique has given the robot to the accurate perception ability of books, and the action ability of cooperation arm just can really accomplish the machine and replace the manual work.