TWI870063B - The method, the computing device, the computer-readable storage medium, and the computer program product for generating an image - Google Patents
The method, the computing device, the computer-readable storage medium, and the computer program product for generating an image Download PDFInfo
- Publication number
- TWI870063B TWI870063B TW112139431A TW112139431A TWI870063B TW I870063 B TWI870063 B TW I870063B TW 112139431 A TW112139431 A TW 112139431A TW 112139431 A TW112139431 A TW 112139431A TW I870063 B TWI870063 B TW I870063B
- Authority
- TW
- Taiwan
- Prior art keywords
- image
- editing
- generating
- artificial intelligence
- character set
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 101
- 238000004590 computer program Methods 0.000 title claims abstract description 16
- 238000013473 artificial intelligence Methods 0.000 claims abstract description 112
- 238000006243 chemical reaction Methods 0.000 claims description 148
- 238000012545 processing Methods 0.000 claims description 124
- 238000010586 diagram Methods 0.000 description 18
- 238000012549 training Methods 0.000 description 10
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 9
- 238000007639 printing Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000009792 diffusion process Methods 0.000 description 4
- 238000013136 deep learning model Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010428 oil painting Methods 0.000 description 2
- 239000003973 paint Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- QVFWZNCVPCJQOP-UHFFFAOYSA-N chloralodol Chemical compound CC(O)(C)CC(C)OC(O)C(Cl)(Cl)Cl QVFWZNCVPCJQOP-UHFFFAOYSA-N 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Editing Of Facsimile Originals (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
本申請是關於一種方法、計算機裝置、電腦可讀取記錄媒體以及電腦程式產品,特別是一種用於產生圖像的方法、計算機裝置、電腦可讀取記錄媒體以及電腦程式產品。The present application relates to a method, a computer device, a computer-readable recording medium and a computer program product, and in particular to a method, a computer device, a computer-readable recording medium and a computer program product for generating an image.
受惠於人工智慧的發展,許多繪圖軟體或圖像產生器在導入人工智慧的技術之後,紛紛能夠產生使用者期許的圖像檔,並幫助許多媒體工作者(例如,Youtuber或論壇主)或一般民眾可以透過人工智慧圖像來加深觀眾對於他們所發表的文章內容及/或情境描述的印象。Thanks to the development of artificial intelligence, many drawing software or image generators are able to generate image files that users expect after introducing artificial intelligence technology, and help many media workers (for example, YouTubers or forum hosts) or ordinary people to deepen the audience's impression of the content and/or situation description of their published articles through artificial intelligence images.
然而,現有的繪圖軟體或圖像產生器都需要使用者分別輸入對應的指令組(Prompt)才能產生使用者期許的圖像檔。也就是說,雖然使用者最終能產生使用者期許的圖像檔,但使用者必須先自行輸入一段文字描述以及與各個特定的修圖操作相對應的指令組。However, existing drawing software or image generators require users to input corresponding command sets (Prompts) to generate the image files expected by the users. In other words, although the user can eventually generate the image files expected by the user, the user must first input a text description and the command sets corresponding to each specific photo editing operation.
根據在先前技術中對於現有技術的描述,本申請之目的在於解決現有技術的不足。具體而言,本申請之一目的在於解決使用者必須輸入對應的指令組才能產生使用者所期許的圖像的問題,特別是解決對於不熟悉指令組的使用者必須輸入對應的指令組才能產生使用者所期許的圖像的問題。According to the description of the prior art, the purpose of this application is to solve the shortcomings of the prior art. Specifically, one purpose of this application is to solve the problem that the user must input the corresponding command group to generate the image desired by the user, especially to solve the problem that the user who is not familiar with the command group must input the corresponding command group to generate the image desired by the user.
本申請提供一種用於產生圖像的方法,適用於透過計算機裝置來產生人工智慧圖像。所述方法包括接收至少一輸入圖像、基於至少一輸入圖像來產生關鍵詞字元組、在接收從至少一修圖按鍵中的任何一個所發送的修圖請求之後,基於與修圖請求相對應的編修指令集對關鍵詞字元組進行至少一字串編修操作,並產生編修字元組、以及基於編修字元組來產生人工智慧圖像。The present application provides a method for generating an image, which is suitable for generating an artificial intelligence image through a computer device. The method includes receiving at least one input image, generating a keyword character group based on the at least one input image, after receiving a photo editing request sent from any one of at least one photo editing buttons, performing at least one string editing operation on the keyword character group based on an editing instruction set corresponding to the photo editing request, and generating an edited character group, and generating an artificial intelligence image based on the edited character group.
在一些實施例中,基於至少一輸入圖像來產生關鍵詞字元組的步驟包括將至少一輸入圖像輸入至影像描述模型、以及透過影像描述模型來輸出關鍵詞字元組。所述影像描述模型是基於至少一輸入圖像而自動地產生與至少一輸入圖像相對應的關鍵詞字元組。In some embodiments, the step of generating a keyword character set based on at least one input image includes inputting the at least one input image into an image description model, and outputting the keyword character set through the image description model. The image description model automatically generates a keyword character set corresponding to the at least one input image based on the at least one input image.
在一些實施例中,基於編修字元組來產生人工智慧圖像的步驟包括將編修字元組輸入至圖像產生模型、以及透過圖像產生模型來輸出人工智慧圖像。所述圖像產生模型是基於編修字元組而自動地產生與編修字元組相對應的人工智慧圖像。In some embodiments, the step of generating an artificial intelligence image based on the edited character set includes inputting the edited character set into an image generation model, and outputting the artificial intelligence image through the image generation model. The image generation model automatically generates an artificial intelligence image corresponding to the edited character set based on the edited character set.
在一些實施例中,關於用於產生圖像的方法更包括根據關鍵詞字元組的內容及使用者的歷史編修記錄中的至少一者來自動地產生至少一修圖按鍵。In some embodiments, the method for generating an image further includes automatically generating at least one photo editing button based on at least one of the content of the keyword character group and the user's historical editing record.
在一些實施例中,關於用於產生圖像的方法更包括基於分別與至少一修圖按鍵中的每一個相對應的堆疊屬性值來決定是否將至少一字串編修操作進行堆疊。In some embodiments, the method for generating an image further includes determining whether to stack at least one string editing operation based on a stacking attribute value corresponding to each of the at least one editing button.
在一些實施例中,堆疊屬性值被設定為可堆疊的或不可堆疊的。In some embodiments, a stack attribute value is set as stackable or non-stackable.
在一些實施例中,堆疊屬性值是根據使用者的偏好設定及預設值中的至少一者而被設定。In some embodiments, the stack attribute value is set based on at least one of a user's preference setting and a default value.
在一些實施例中,關於用於產生圖像的方法更包括基於分別與至少一修圖按鍵中的每一個相對應的編修權重值來決定至少一字串編修操作對關鍵詞字元組的編修程度。In some embodiments, the method for generating an image further includes determining a degree of editing of a keyword character group by at least one string editing operation based on an editing weight value corresponding to each of at least one editing button.
在一些實施例中,編修權重值是根據關鍵詞字元組的內容、使用者的歷史編修記錄及預設值中的至少一者而被設定。In some embodiments, the edit weight value is set based on at least one of the content of the keyword character group, the user's historical editing record, and a default value.
在一些實施例中,關於用於產生圖像的方法更包括接收與物件的實體大小相對應的物件資訊、以及基於物件資訊來決定人工智慧圖像的尺寸。In some embodiments, the method for generating an image further includes receiving object information corresponding to a physical size of the object, and determining a size of the artificial intelligence image based on the object information.
在一些實施例中,基於至少一輸入圖像來產生關鍵詞字元組的步驟包括接收至少一裝飾圖像、將至少一輸入圖像和至少一裝飾圖像進行至少一構圖操作,並產生編排圖像、將編排圖像輸入至影像描述模型、以及透過影像描述模型來輸出關鍵詞字元組。影像描述模型是基於編排圖像而自動地產生與編排圖像相對應的關鍵詞字元組。In some embodiments, the step of generating a keyword character set based on at least one input image includes receiving at least one decorative image, performing at least one composition operation on the at least one input image and at least one decorative image to generate a layout image, inputting the layout image into an image description model, and outputting a keyword character set through the image description model. The image description model automatically generates a keyword character set corresponding to the layout image based on the layout image.
在一些實施例中,關於用於產生圖像的方法更包括將人工智慧圖像傳輸至圖像輸出裝置,並透過圖像輸出裝置來將人工智慧圖像進行實體輸出。In some embodiments, the method for generating an image further includes transmitting the artificial intelligence image to an image output device, and physically outputting the artificial intelligence image through the image output device.
本申請亦提供一種用於產生圖像的方法,適用於透過計算機裝置來產生風格轉換圖像。所述方法包括接收至少一輸入圖像、以及在接收從至少一風格轉換按鍵中的任何一個所發送的風格轉換請求之後,基於風格轉換請求來將至少一輸入圖像輸入至風格轉換模型,並透過風格轉換模型來輸出風格轉換圖像。風格轉換模型是基於至少一學習圖像的風格學習結果來對至少一輸入圖像執行至少一風格轉換操作,並產生風格轉換圖像。The present application also provides a method for generating an image, which is suitable for generating a style conversion image through a computer device. The method includes receiving at least one input image, and after receiving a style conversion request sent from any one of at least one style conversion buttons, inputting at least one input image into a style conversion model based on the style conversion request, and outputting the style conversion image through the style conversion model. The style conversion model performs at least one style conversion operation on at least one input image based on the style learning result of at least one learning image, and generates a style conversion image.
在一些實施例中,至少一學習圖像是透過如先前所述之用於產生圖像的方法中的任何一種方法所產生的人工智慧圖像。In some embodiments, at least one learning image is an artificial intelligence image generated by any of the methods for generating images as previously described.
在一些實施例中,關於用於產生圖像的方法更包括將風格轉換圖像傳輸至圖像輸出裝置,並透過圖像輸出裝置來將風格轉換圖像進行實體輸出。In some embodiments, the method for generating an image further includes transmitting the style-converted image to an image output device, and physically outputting the style-converted image through the image output device.
本申請也提供一種用於產生圖像的計算機裝置,其適用於與使用者終端訊號連接,以從使用者終端接收至少一輸入圖像,並基於至少一輸入圖像來產生輸出圖像。所述計算機裝置包括圖像接收模組、處理模組以及儲存模組。所述圖像接收模組被配置成與使用者終端訊號連接,並且被配置成適用於接收至少一輸入圖像。所述處理模組被配置成與圖像接收模組訊號連接,並且被配置成在執行儲存在儲存模組中的程式碼後,能夠執行如先前所述之用於產生圖像的方法中的任何一種方法來產生人工智慧圖像,或者能夠執行如先前所述之用於產生圖像的方法中的任何一種方法來產生風格轉換圖像,並將人工智慧圖像或風格轉換圖像作為輸出圖像。所述儲存模組被配置成與處理模組訊號連接,並且被配置成儲存有程式碼。The present application also provides a computer device for generating an image, which is suitable for being connected to a user terminal signal to receive at least one input image from the user terminal and to generate an output image based on the at least one input image. The computer device includes an image receiving module, a processing module and a storage module. The image receiving module is configured to be connected to a user terminal signal and is configured to be suitable for receiving at least one input image. The processing module is configured to be connected to the image receiving module signal, and is configured to execute any of the methods for generating images as described above to generate an artificial intelligence image after executing the program code stored in the storage module, or to execute any of the methods for generating images as described above to generate a style conversion image, and use the artificial intelligence image or the style conversion image as an output image. The storage module is configured to be connected to the processing module signal, and is configured to store the program code.
在一些實施例中,關於用於產生圖像的計算機裝置更包括圖像輸出模組。所述圖像輸出模組被配置成與處理模組訊號連接,適用於將輸出圖像傳輸至圖像輸出裝置。In some embodiments, the computer device for generating images further includes an image output module, which is configured to be signal-connected to the processing module and is suitable for transmitting the output image to the image output device.
本申請還提供一種用於產生圖像的電腦可讀取記錄媒體,當計算機裝置載入內存的程式碼並執行程式碼後,能夠完成如先前所述之用於產生圖像的方法中的任何一種方法。The present application also provides a computer-readable recording medium for generating images. When a computer device loads the program code in the memory and executes the program code, any of the methods for generating images as described above can be completed.
本申請又提供一種用於產生圖像的電腦程式產品,當計算機裝置載入該電腦程式產品並執行該電腦程式產品後,能夠完成如先前所述之用於產生圖像的方法中的任何一種方法。The present application also provides a computer program product for generating images. When a computer device loads the computer program product and executes the computer program product, it can complete any of the methods for generating images as described above.
透過本申請提供的技術手段可以產生先前技術無法達成的有利功效。具體而言,本申請之一有利功效為供使用者在不需要額外自行輸入指令組的情況下,就能夠產生使用者所期許的圖像檔,藉以降低使用者的操作門檻及/或操作難度,進而使不熟悉指令組的使用者也能夠透過本申請提供的技術手段來產生使用者所期許的圖像檔。The technical means provided by this application can produce beneficial effects that cannot be achieved by previous technologies. Specifically, one beneficial effect of this application is that the user can generate the image file expected by the user without the need to input the command set by himself, thereby reducing the user's operation threshold and/or operation difficulty, so that users who are not familiar with the command set can also generate the image file expected by the user through the technical means provided by this application.
本申請將透過下述的實施例和所附之圖式來詳細說明本申請的內容,藉以幫助本申請所屬技術領域中具有通常知識者理解本申請之目的、特徵及其功效。應當注意的是,此處所描述的各個步驟可以被依序地、以相反的順序或透過在控制處理期間適當地改變或跳過順序來執行。應當注意的是,此處所描述的“第一步驟可以接續在第二步驟之後被執行”,其可以表示在執行完第二步驟之後直接地接續執行第一步驟,也可以表示在執行完第二步驟之後先接續執行其他的步驟(例如,第三步驟)再接續執行第一步驟。This application will be described in detail through the following embodiments and the attached drawings to help those with ordinary knowledge in the technical field to which this application belongs to understand the purpose, features and efficacy of this application. It should be noted that the various steps described herein can be executed sequentially, in reverse order, or by appropriately changing or skipping the order during the control process. It should be noted that the "first step can be executed after the second step" described herein can mean that the first step is directly executed after the second step is executed, or it can mean that after the second step is executed, other steps (for example, the third step) are executed first and then the first step is executed.
此外,在本申請所描述的內容中,應當注意的是,諸如“第一”、“第二”和“第三”等用語是用以區分元件之間的不同,而不是用以限制元件本身或表示元件的特定排序。應當注意的是,在以下的說明內容中,相同的元件或步驟可以用相同的編號來表示。In addition, in the description of the present application, it should be noted that terms such as "first", "second" and "third" are used to distinguish between elements, rather than to limit the elements themselves or to indicate a specific order of the elements. It should be noted that in the following description, the same elements or steps may be represented by the same numbers.
請參考圖1,圖1是說明本申請之一個實施例之計算機裝置200與使用者裝置110及伺服器120中的至少一者訊號連接並且與圖像輸出裝置310及伺服器320中的至少一者訊號連接的連接關係示意圖。Please refer to FIG. 1 , which is a schematic diagram illustrating a connection relationship between a computer device 200 in an embodiment of the present application, which is signal-connected to at least one of a user device 110 and a server 120, and which is signal-connected to at least one of an image output device 310 and a server 320.
由於使用者裝置110及/或伺服器120可以被配置為與計算機裝置200訊號連接,其具體可以是在使用者裝置110及/或伺服器120與計算機裝置200之間配置訊號線,使得使用者裝置110及/或伺服器120適用於經由所述訊號線來向計算機裝置200提供輸入圖像(具體可以是至少一輸入圖像)。在一些實施例中,所述輸入圖像可以是由例如聯名廠商、藝術家或消費者所提供,但不限於此。在一些實施例中,所述輸入圖像可以是例如人像、水果、動物、動漫人物或自創人物的圖,但不限於此。在一個具體的範例中,使用者可以透過使用者裝置110向計算機裝置200提供一個或多個照片檔作為所述輸入圖像。Since the user device 110 and/or the server 120 can be configured to be signal-connected to the computer device 200, it can be specifically configured to configure a signal line between the user device 110 and/or the server 120 and the computer device 200, so that the user device 110 and/or the server 120 are suitable for providing an input image (specifically at least one input image) to the computer device 200 via the signal line. In some embodiments, the input image can be provided by, for example, a co-branded manufacturer, an artist, or a consumer, but is not limited thereto. In some embodiments, the input image can be, for example, a portrait, a fruit, an animal, an anime character, or a self-created character, but is not limited thereto. In a specific example, the user can provide one or more photo files as the input image to the computer device 200 through the user device 110.
在一些實施例中,計算機裝置200可以透過實體訊號線連接的方式從使用者裝置110及/或伺服器120接收輸入圖像,其中,所述實體訊號線可以是例如符合網際網路協定的網路訊號線,但不限於此。在另一些實施例中,計算機裝置200也可以不用透過實體訊號線連接的方式從使用者裝置110及/或伺服器120接收輸入圖像,更具體地說,計算機裝置200也可以透過虛擬訊號線連接的方式從使用者裝置110及/或伺服器120接收輸入圖像,其中,所述虛擬訊號線可以是例如符合無線通訊協定的Wi-Fi、4G/5G、藍芽、近距離通訊等,但不限於此。In some embodiments, the computer device 200 may receive input images from the user device 110 and/or the server 120 via a physical signal line connection, wherein the physical signal line may be, for example, a network signal line that complies with the Internet protocol, but is not limited thereto. In other embodiments, the computer device 200 may also receive input images from the user device 110 and/or the server 120 without using a physical signal line connection. More specifically, the computer device 200 may also receive input images from the user device 110 and/or the server 120 via a virtual signal line connection, wherein the virtual signal line may be, for example, Wi-Fi, 4G/5G, Bluetooth, near field communication, etc. that complies with the wireless communication protocol, but is not limited thereto.
在一些實施例中,使用者裝置110可以是能夠儲存檔案的計算機裝置,其具體可以是諸如智慧型手機、平板電腦、個人電腦等,但不限於此。在一些實施例中,伺服器120可以是能夠儲存檔案的計算機裝置,其具體可以是實體主機或虛擬的雲端伺服器,但不限於此。In some embodiments, the user device 110 may be a computer device capable of storing files, which may specifically be a smart phone, a tablet computer, a personal computer, etc., but is not limited thereto. In some embodiments, the server 120 may be a computer device capable of storing files, which may specifically be a physical host or a virtual cloud server, but is not limited thereto.
計算機裝置200可以被配置成從使用者裝置110及/或伺服器120接收輸入圖像,並且被配置成基於所接收到的輸入圖像來產生人工智慧圖像,其具體可以是透過執行本申請提供之用於產生圖像的方法的各個步驟來產生人工智慧圖像及/或風格轉換圖像。在一些實施例中,所述輸入圖像也可以事先被儲存在計算機裝置200中,使得計算機裝置200也能夠基於儲存在計算機裝置200中的輸入圖像來產生人工智慧圖像及/或風格轉換圖像。The computer device 200 may be configured to receive an input image from the user device 110 and/or the server 120, and to generate an artificial intelligence image based on the received input image, which may specifically be generated by executing the various steps of the method for generating an image provided by the present application to generate an artificial intelligence image and/or a style conversion image. In some embodiments, the input image may also be stored in the computer device 200 in advance, so that the computer device 200 can also generate an artificial intelligence image and/or a style conversion image based on the input image stored in the computer device 200.
計算機裝置200基本可以被配置成包括圖像接收模組210、處理模組220(即處理模組220A及處理模組220B中的至少一者)和儲存模組230。在一些實施例中,計算機裝置200可以進一步包括圖像輸出模組240。在另一些實施例中,計算機裝置200可以進一步包括圖像資料庫250。在另一些實施例中,計算機裝置200可以進一步包括編修資料庫260。也就是說,圖像輸出模組240、圖像資料庫250和編修資料庫260可以根據使用者的需求而選擇性地被配置於計算機裝置200中。The computer device 200 can be basically configured to include an image receiving module 210, a processing module 220 (i.e., at least one of the processing module 220A and the processing module 220B), and a storage module 230. In some embodiments, the computer device 200 can further include an image output module 240. In other embodiments, the computer device 200 can further include an image database 250. In other embodiments, the computer device 200 can further include an editing database 260. In other words, the image output module 240, the image database 250, and the editing database 260 can be selectively configured in the computer device 200 according to the needs of the user.
圖像接收模組210可以被配置成與使用者裝置110及/或伺服器120,並且可以被配置成適用於從使用者裝置110及/或伺服器120接收輸入圖像(具體可以是至少一輸入圖像)。在一些實施例中,計算機裝置200可以將透過圖像接收模組210從使用者裝置110及/或伺服器120接收到的輸入圖像儲存在圖像資料庫250中。The image receiving module 210 may be configured to communicate with the user device 110 and/or the server 120, and may be configured to be suitable for receiving input images (specifically, at least one input image) from the user device 110 and/or the server 120. In some embodiments, the computer device 200 may store the input images received from the user device 110 and/or the server 120 through the image receiving module 210 in the image database 250.
處理模組220可以被配置成與圖像接收模組210訊號連接,並且可以被配置成能夠實現本申請提供之用於產生圖像的方法中的任何一種方法。更具體地說,處理模組220可以被配置成在執行本申請提供之用於產生圖像的方法的各個步驟之後,能夠基於輸入圖像來產生人工智慧圖像及/或風格轉換圖像。在一些實施例中,處理模組220可以是中央處理單元,其具體可以是本申請所屬技術領域中具有通常知識者所知悉的任何類型的處理器,但不限於此。The processing module 220 can be configured to be signal-connected to the image receiving module 210, and can be configured to implement any of the methods for generating images provided in the present application. More specifically, the processing module 220 can be configured to generate an artificial intelligence image and/or a style conversion image based on the input image after executing each step of the method for generating images provided in the present application. In some embodiments, the processing module 220 can be a central processing unit, which can specifically be any type of processor known to a person of ordinary skill in the art to which the present application belongs, but is not limited thereto.
在一些實施例中,處理模組220可以是指能夠基於輸入圖像來產生人工智慧圖像的處理模組220A。在另一些實施例中,處理模組220可以是指能夠基於輸入圖像來產生風格轉換圖像的處理模組220B。在另一些實施例中,處理模組220A和處理模組220B的功能與配置可以被整合至相同的處理模組中,使得所述處理模組不僅能夠基於輸入圖像來產生人工智慧圖像,也能夠基於輸入圖像來產生風格轉換圖像。In some embodiments, the processing module 220 may refer to a processing module 220A capable of generating an artificial intelligence image based on an input image. In other embodiments, the processing module 220 may refer to a processing module 220B capable of generating a style conversion image based on an input image. In other embodiments, the functions and configurations of the processing module 220A and the processing module 220B may be integrated into the same processing module, so that the processing module can generate not only an artificial intelligence image based on an input image, but also a style conversion image based on an input image.
儲存模組230可以被配置成與處理模組220(包括處理模組220A及/或處理模組220B)訊號連接,並且可以被配置成儲存有程式碼,使得處理模組220在載入並執行所述程式碼之後,能夠執行本申請提供之用於產生圖像的方法的各個步驟。在一些實施例中,儲存模組230基本可以被配置成包括揮發性記憶體(具體可以是至少一揮發性記憶體)和非揮發性記憶體(具體可以是至少一非揮發性記憶體)。在一些實施例中,揮發性記憶體可以是本申請所屬技術領域中具有通常知識者所知悉的任何類型的記憶體,例如,動態隨機存取記憶體或靜態隨機存取記憶體等,但不限於此。在一些實施例中,非揮發性記憶體可以是本申請所屬技術領域中具有通常知識者所知悉的任何類型的記憶體,例如,唯讀記憶體、快閃記憶體或非揮發性隨機存取記憶體等,但不限於此。The storage module 230 can be configured to be signal-connected to the processing module 220 (including the processing module 220A and/or the processing module 220B), and can be configured to store a program code, so that the processing module 220 can execute each step of the method for generating an image provided by the present application after loading and executing the program code. In some embodiments, the storage module 230 can be basically configured to include a volatile memory (specifically, at least one volatile memory) and a non-volatile memory (specifically, at least one non-volatile memory). In some embodiments, the volatile memory may be any type of memory known to a person skilled in the art, such as a dynamic random access memory or a static random access memory, but not limited thereto. In some embodiments, the non-volatile memory may be any type of memory known to a person skilled in the art, such as a read-only memory, a flash memory, or a non-volatile random access memory, but not limited thereto.
圖像輸出模組240可以被配置成與處理模組220(包括處理模組220A及/或處理模組220B)訊號連接,並且可以被配置成將由處理模組220所產生的圖像進行輸出。在一些實施例中,圖像輸出模組240可以將由處理模組220A所產生的人工智慧圖像及/或由處理模組220B所產生的風格轉換圖像輸出至圖像輸出裝置310及/或伺服器320。The image output module 240 may be configured to be signal-connected to the processing module 220 (including the processing module 220A and/or the processing module 220B), and may be configured to output the image generated by the processing module 220. In some embodiments, the image output module 240 may output the artificial intelligence image generated by the processing module 220A and/or the style conversion image generated by the processing module 220B to the image output device 310 and/or the server 320.
在一些實施例中,由處理模組220A所產生的人工智慧圖像及/或由處理模組220B所產生的風格轉換圖像可以被儲存在圖像資料庫250中,處理模組220A及/或處理模組220B在接收到圖像輸出的請求之後,處理模組220A及/或處理模組220B可以分別將儲存在圖像資料庫250中的人工智慧圖像及/或風格轉換圖像輸出至圖像輸出裝置310及/或伺服器320。In some embodiments, the artificial intelligence image generated by the processing module 220A and/or the style conversion image generated by the processing module 220B can be stored in the image database 250. After the processing module 220A and/or the processing module 220B receive a request for image output, the processing module 220A and/or the processing module 220B can output the artificial intelligence image and/or the style conversion image stored in the image database 250 to the image output device 310 and/or the server 320, respectively.
圖像資料庫250可以被配置成儲存圖像,所述圖像可以是例如輸入圖像、人工智慧圖像及/或風格轉換圖像,但不限於此。在一些實施例中,儲存在圖像資料庫250中的圖像的檔案格式可以是聯合圖像專家群(JPEG)、可攜網路圖像(PNG)、位元映像格式(BMP)、圖形交換格式(GIF)或帶標影像檔案格式(TIFF)等,但不限於此。The image database 250 may be configured to store images, which may be, for example, input images, artificial intelligence images, and/or style-converted images, but are not limited thereto. In some embodiments, the file format of the images stored in the image database 250 may be, but are not limited to, Joint Photographic Experts Group (JPEG), Portable Network Image (PNG), Bitmap Image Format (BMP), Graphics Interchange Format (GIF), or Tagged Image File Format (TIFF), etc.
編修資料庫260可以被配置成儲存使用者的歷史編修記錄,所述使用者的歷史編修記錄可以是例如字串編修操作的歷史記錄及/或風格轉換操作的歷史記錄,特別是與輸入圖像相對應的字串編修操作的歷史記錄及/或風格轉換操作的歷史記錄,但不限於此。在一些實施例中,所述使用者的歷史編修記錄也可以包括執行字串編修操作的時間點及/或執行風格轉換操作的時間點。The editing database 260 may be configured to store the user's historical editing records, which may be, for example, historical records of string editing operations and/or historical records of style conversion operations, particularly historical records of string editing operations and/or historical records of style conversion operations corresponding to input images, but not limited thereto. In some embodiments, the user's historical editing records may also include the time points of executing string editing operations and/or the time points of executing style conversion operations.
由於計算機裝置200可以被配置為與圖像輸出裝置310及/或伺服器320訊號連接,其具體可以是在計算機裝置200與圖像輸出裝置310及/或伺服器320之間配置訊號線,使得計算機裝置200適用於經由所述訊號線來向圖像輸出裝置310及/或伺服器320輸出人工智慧圖像。Since the computer device 200 can be configured to be signal-connected to the image output device 310 and/or the server 320, it can be specifically configured by configuring a signal line between the computer device 200 and the image output device 310 and/or the server 320, so that the computer device 200 is suitable for outputting artificial intelligence images to the image output device 310 and/or the server 320 via the signal line.
在一些實施例中,計算機裝置200可以透過實體訊號線連接的方式向圖像輸出裝置310及/或伺服器320輸出人工智慧圖像,其中,所述實體訊號線可以是例如符合網際網路協定的網路訊號線,但不限於此。在另一些實施例中,計算機裝置200也可以不用透過實體訊號線連接的方式向圖像輸出裝置310及/或伺服器320輸出人工智慧圖像,更具體地說,計算機裝置200也可以透過虛擬訊號線連接的方式向圖像輸出裝置310及/或伺服器320輸出人工智慧圖像,其中,所述虛擬訊號線可以是例如符合無線通訊協定的Wi-Fi、4G/5G、藍芽、近距離通訊等,但不限於此。In some embodiments, the computer device 200 may output the artificial intelligence image to the image output device 310 and/or the server 320 by connecting with a physical signal line, wherein the physical signal line may be, for example, a network signal line that complies with the Internet protocol, but is not limited thereto. In other embodiments, the computer device 200 may output the artificial intelligence image to the image output device 310 and/or the server 320 without connecting with a physical signal line. More specifically, the computer device 200 may output the artificial intelligence image to the image output device 310 and/or the server 320 by connecting with a virtual signal line, wherein the virtual signal line may be, for example, Wi-Fi, 4G/5G, Bluetooth, near field communication, etc. that complies with the wireless communication protocol, but is not limited thereto.
藉此,透過如圖1所示的計算機裝置200不僅可以基於接收到的輸入圖像來產生人工智慧圖像及/或風格轉換圖像,還可以供使用者在不需要額外自行輸入指令組的情況下,就能夠產生所述人工智慧圖像及/或風格轉換圖像。也就是說,本申請提供之用於產生圖像的計算機裝置可以降低使用者的操作門檻及/或操作難度,進而使不熟悉指令組的使用者也能夠透過所述計算機裝置來產生使用者所期許的圖像檔。Thus, the computer device 200 shown in FIG1 can not only generate an artificial intelligence image and/or a style conversion image based on the received input image, but also can generate the artificial intelligence image and/or style conversion image without the user having to input an additional command set. In other words, the computer device for generating images provided by the present application can reduce the user's operating threshold and/or operating difficulty, thereby enabling users who are not familiar with the command set to generate the image file desired by the user through the computer device.
請參考圖2,圖2是說明本申請之一個實施例之處理模組220A的方塊示意圖。處理模組220A的配置可以與圖1所示的處理模組220基本相同,其不同之處在於,處理模組220A可以是專門被配置成基於所述輸入圖像來產生人工智慧圖像,而基於所述輸入圖像來產生風格轉換圖像的功能與配置可以被省略。Please refer to FIG2, which is a block diagram of a processing module 220A of an embodiment of the present application. The configuration of the processing module 220A may be substantially the same as the processing module 220 shown in FIG1, except that the processing module 220A may be specifically configured to generate an artificial intelligence image based on the input image, and the function and configuration of generating a style conversion image based on the input image may be omitted.
處理模組220A基本可以被配置成包括圖像接收單元221A、字串產生單元222A、修圖請求接收單元223A、字串編修單元224A、圖像產生單元225A和圖像輸出單元226A。在一些實施例中,處理模組220A可以進一步包括影像描述模型270。在一些實施例中,處理模組220A可以進一步包括圖像產生模型280。也就是說,影像描述模型270及/或圖像產生模型280可以根據使用者的需求而選擇性地被配置於處理模組220A中。在一些實施例中,影像描述模型270的功能與配置可以被整合進字串產生單元222A中。在一些實施例中,圖像產生模型280的功能與配置可以被整合進圖像產生單元225A中。The processing module 220A can be basically configured to include an image receiving unit 221A, a string generating unit 222A, a picture editing request receiving unit 223A, a string editing unit 224A, an image generating unit 225A and an image output unit 226A. In some embodiments, the processing module 220A can further include an image description model 270. In some embodiments, the processing module 220A can further include an image generating model 280. In other words, the image description model 270 and/or the image generating model 280 can be selectively configured in the processing module 220A according to the needs of the user. In some embodiments, the function and configuration of the image description model 270 can be integrated into the string generating unit 222A. In some embodiments, the function and configuration of the image generating model 280 can be integrated into the image generating unit 225A.
圖像接收單元221A可以被配置成適用於透過圖像接收模組210從使用者裝置110及/或伺服器120接收輸入圖像(具體可以是至少一輸入圖像)。在一些實施例中,圖像接收單元221A也可以被配置成從圖像資料庫250接收輸入圖像,也就是說,所述輸入圖像可以透過圖像接收模組210而事先被儲存在圖像資料庫250中,使得圖像接收單元221A也能夠從圖像資料庫250接收輸入圖像。The image receiving unit 221A may be configured to be suitable for receiving input images (specifically, at least one input image) from the user device 110 and/or the server 120 through the image receiving module 210. In some embodiments, the image receiving unit 221A may also be configured to receive input images from the image database 250, that is, the input images may be stored in the image database 250 in advance through the image receiving module 210, so that the image receiving unit 221A can also receive input images from the image database 250.
字串產生單元222A可以被配置成基於輸入圖像來產生關鍵詞字元組,更具體地說,字串產生單元222A可以被配置成基於由圖像接收單元221A所接收到的輸入圖像來產生關鍵詞字元組。在一些實施例中,字串產生單元222A可以將輸入圖像輸入至影像描述模型270,並透過影像描述模型270來自動地產生並輸出關鍵詞字元組。在一些實施例中,影像描述模型270可以是例如CLIP model或DeepBooru,但不限於此。The string generation unit 222A can be configured to generate a keyword character set based on an input image, more specifically, the string generation unit 222A can be configured to generate a keyword character set based on an input image received by the image receiving unit 221A. In some embodiments, the string generation unit 222A can input the input image to the image description model 270, and automatically generate and output a keyword character set through the image description model 270. In some embodiments, the image description model 270 can be, for example, a CLIP model or DeepBooru, but is not limited thereto.
更詳細地說,所述Clip model是一種深度學習模型,事先透過複數筆資料(每一筆資料包括訓練圖像及其對應的訓練字串)進行訓練,使得已經完成訓練的Clip model在識別所接收到的輸入圖像之後,能夠將所述輸入圖像匹配至相對應的字串;而所述DeepBooru亦是一種深度學習模型,事先透過複數筆資料(每一筆資料包括訓練圖像及其對應的訓練字串)進行訓練,使得已經完成訓練的DeepBooru在識別所接收到的輸入圖像之後,同樣也能夠將所述輸入圖像匹配至相對應的字串(即關鍵詞字元組)。In more detail, the Clip model is a deep learning model that is trained in advance with multiple data (each data includes a training image and its corresponding training string), so that the trained Clip model can match the input image to the corresponding string after recognizing the received input image; and the DeepBooru is also a deep learning model that is trained in advance with multiple data (each data includes a training image and its corresponding training string), so that the trained DeepBooru can also match the input image to the corresponding string (i.e., keyword character group) after recognizing the received input image.
所述Clip model和所述DeepBooru都能夠基於圖像來產生相對應的字串,二者之間的差異之處在於,所述DeepBooru能夠直接產生與輸入圖像相對應的關鍵詞字元組,而所述Clip model則是產生與輸入圖像相對應的完整字串(之後再基於由所述Clip model所產生的完整字串來產生關鍵詞字元組)。在一些實施例中,所述Clip model可以搭配諸如ChatGPT的大型語言模型而基於由所述Clip model所產生的完整字串來產生關鍵詞字元組,但不限於此。The Clip model and DeepBooru can both generate corresponding strings based on images. The difference between the two is that DeepBooru can directly generate keyword character sets corresponding to the input image, while the Clip model generates a complete string corresponding to the input image (and then generates keyword character sets based on the complete string generated by the Clip model). In some embodiments, the Clip model can be used in conjunction with a large language model such as ChatGPT to generate keyword character sets based on the complete string generated by the Clip model, but is not limited thereto.
修圖請求接收單元223A可以被配置成接收由修圖按鍵(如圖6所示的修圖按鍵630)所發送的修圖請求,也就是說,當使用者點選至少一修圖按鍵中的任何一個時,被點選的修圖按鍵將會發送對應的修圖請求,使得修圖請求接收單元223A可以接收與被點選的修圖按鍵相對應的修圖請求。The photo editing request receiving unit 223A can be configured to receive a photo editing request sent by a photo editing button (such as the photo editing button 630 shown in Figure 6), that is, when the user clicks on any one of at least one photo editing button, the clicked photo editing button will send a corresponding photo editing request, so that the photo editing request receiving unit 223A can receive the photo editing request corresponding to the clicked photo editing button.
字串編修單元224A可以被配置成對關鍵詞字元組進行字串編修操作,具體而言,字串編修單元224A可以被配置成對由字串產生單元222A所產生的關鍵詞字元組進行字串編修操作。更具體地說,字串編修單元224A可以被配置成在修圖請求接收單元223A接收到由至少一修圖按鍵中的任何一個所發送的修圖請求之後,對由字串產生單元222A所產生的關鍵詞字元組進行字串編修操作,並產生編修字元組。所述字串編修操作可以包括新增、修改及/或刪除,但不限於此。在一些實施例中,所述字串編修操作可以根據如圖6所示的修圖按鍵630而分別地事先被決定,因此,字串編修單元224A可以根據與如圖6所示的修圖按鍵630相對應的字串編修操作來產生所述編修字元組。The string editing unit 224A may be configured to perform a string editing operation on a keyword character set. Specifically, the string editing unit 224A may be configured to perform a string editing operation on the keyword character set generated by the string generating unit 222A. More specifically, the string editing unit 224A may be configured to perform a string editing operation on the keyword character set generated by the string generating unit 222A and generate an edited character set after the image editing request receiving unit 223A receives an image editing request sent by any one of the at least one image editing buttons. The string editing operation may include adding, modifying and/or deleting, but is not limited thereto. In some embodiments, the string editing operations can be determined in advance according to the editing buttons 630 shown in FIG. 6 , respectively. Therefore, the string editing unit 224A can generate the editing character set according to the string editing operations corresponding to the editing buttons 630 shown in FIG. 6 .
圖像產生單元225A可以被配置成基於編修字元組來產生人工智慧圖像,更具體地說,圖像產生單元225A可以被配置成基於由字串編修單元224A所產生的編修字元組來產生人工智慧圖像。在一些實施例中,圖像產生單元225A可以將編修字元組輸入至圖像產生模型280,並透過圖像產生模型280來自動地產生並輸出人工智慧圖像。在一些實施例中,圖像產生模型280可以是本申請所屬技術領域中具有通常知識者所知悉的任何能夠基於字串來產生圖像的模型,例如Stable Diffusion,但不限於此。The image generation unit 225A can be configured to generate an artificial intelligence image based on an edited character set. More specifically, the image generation unit 225A can be configured to generate an artificial intelligence image based on an edited character set generated by the string editing unit 224A. In some embodiments, the image generation unit 225A can input the edited character set into the image generation model 280, and automatically generate and output an artificial intelligence image through the image generation model 280. In some embodiments, the image generation model 280 can be any model known to a person of ordinary skill in the art to which the present application belongs that can generate an image based on a string, such as Stable Diffusion, but not limited thereto.
所述Stable Diffusion是一種深度學習模型,事先透過複數筆資料(每一筆資料包括訓練字串及其對應的訓練圖像)進行訓練,使得已經完成訓練的Stable Diffusion在識別所接收到的編修字元組之後,能夠基於所述編修字元組來產生與所述編修字元組相對應的人工智慧圖像。The Stable Diffusion is a deep learning model that is trained in advance with multiple data (each data includes a training string and its corresponding training image), so that after the trained Stable Diffusion recognizes the received edited character group, it can generate an artificial intelligence image corresponding to the edited character group based on the edited character group.
圖像輸出單元226A可以被配置成將由圖像產生單元225A所產生的人工智慧圖像進行輸出。在一些實施例中,圖像輸出單元226A可以將所述人工智慧圖像以特定的檔案格式儲存在圖像資料庫250中。在一些實施例中,圖像輸出單元226A可以將所述人工智慧圖像以特定的檔案格式並透過圖像輸出模組240輸出至圖像輸出裝置310及/或伺服器320。The image output unit 226A may be configured to output the artificial intelligence image generated by the image generation unit 225A. In some embodiments, the image output unit 226A may store the artificial intelligence image in a specific file format in the image database 250. In some embodiments, the image output unit 226A may output the artificial intelligence image in a specific file format to the image output device 310 and/or the server 320 through the image output module 240.
藉此,透過如圖2所示的處理模組220A不僅可以基於接收到的輸入圖像來產生人工智慧圖像,還可以供使用者在不需要額外自行輸入指令組的情況下,就能夠產生所述人工智慧圖像。也就是說,本申請提供之用於產生圖像的計算機裝置可以降低使用者的操作門檻及/或操作難度,進而使不熟悉指令組的使用者也能夠透過所述計算機裝置來產生使用者所期許的圖像檔。Thus, the processing module 220A shown in FIG2 can not only generate an artificial intelligence image based on the received input image, but also generate the artificial intelligence image without the user having to input an additional command set. In other words, the computer device for generating images provided by the present application can reduce the user's operating threshold and/or operating difficulty, thereby enabling users who are not familiar with the command set to generate the image file desired by the user through the computer device.
請參考圖3,圖3是說明本申請之第一實施例之用於產生圖像的方法的流程圖,更具體地說,圖3所示的方法能夠產生人工智慧圖像,而所述方法包括步驟S310、S320、S330和S340。在一些實施例中,如圖3所示的方法可以使用Visual Studio Code作為開發環境,並且搭配以python為程式語言的pytorch和huggingface diffusers的架構來實現所述方法。在一些實施例中,如圖3所示的方法可以使用torch、diffusers和numpy等開源函式庫。Please refer to Figure 3, which is a flow chart of a method for generating an image of the first embodiment of the present application. More specifically, the method shown in Figure 3 can generate an artificial intelligence image, and the method includes steps S310, S320, S330 and S340. In some embodiments, the method shown in Figure 3 can use Visual Studio Code as a development environment, and can be implemented with the framework of pytorch and huggingface diffusers using python as the programming language. In some embodiments, the method shown in Figure 3 can use open source libraries such as torch, diffusers and numpy.
在步驟S310中,接收輸入圖像(具體可以是至少一輸入圖像)。步驟S310可以透過如圖2所示的處理模組220A的圖像接收單元221A而被執行。在一些實施例中,所述輸入圖像可以是原始的圖像檔(即未經任何影像處理操作的圖像檔),例如透過攝像模組(圖未示)所取得的照片檔,但不限於此。在另一些實施例中,所述輸入圖像也可以是經過影像處理操作(具體可以是至少一影像處理操作)的圖像檔,例如透過諸如photoshop的影像處理模型所取得的圖像檔,但不限於此。在一些實施例中,所述影像處理操作可以包括去背操作、圖像大小調整操作、圖像鏡像翻轉操作或圖像旋轉操作等,但不限於此。In step S310, an input image (specifically, at least one input image) is received. Step S310 can be executed through the image receiving unit 221A of the processing module 220A as shown in FIG. 2 . In some embodiments, the input image can be an original image file (i.e., an image file that has not undergone any image processing operation), such as a photo file obtained by a camera module (not shown), but not limited to this. In other embodiments, the input image can also be an image file that has undergone an image processing operation (specifically, at least one image processing operation), such as an image file obtained by an image processing model such as Photoshop, but not limited to this. In some embodiments, the image processing operation can include a background removal operation, an image resizing operation, an image mirror flip operation, or an image rotation operation, but not limited to this.
在步驟S320中,基於所述輸入圖像來產生關鍵詞字元組。步驟S320可以透過如圖2所示的處理模組220A的字串產生單元222A而被執行。在一些實施例中,步驟S320可以接續在步驟S310之後被執行。在一些實施例中,在執行完步驟S310之後,可以先對所述輸入圖像執行影像處理操作,再執行步驟S320(即基於經影像處理操作後的輸入圖像來產生關鍵詞字元組)。In step S320, a keyword character set is generated based on the input image. Step S320 can be executed by the string generation unit 222A of the processing module 220A as shown in FIG2 . In some embodiments, step S320 can be executed after step S310. In some embodiments, after executing step S310, an image processing operation can be performed on the input image first, and then step S320 is executed (i.e., a keyword character set is generated based on the input image after the image processing operation).
在一些實施例中,可以透過預先建立的圖像辨識模型(圖未示)來分析所述輸入圖像,藉以辨識所述輸入圖像中的主要元素及/或其他元素(即非主要元素),進而基於所述輸入圖像中的各個元素來產生關鍵詞字元組。更具體地說,可以使用例如ControlNet-openpose來分析所述輸入圖像的手部姿勢以及四肢動作,並且可以使用例如ControlNet-Canny來分析所述輸入圖像的物件邊緣偵測和邊界描繪,進而根據分析結果來產生關鍵詞字元組。In some embodiments, the input image can be analyzed by a pre-established image recognition model (not shown) to identify the main elements and/or other elements (i.e., non-main elements) in the input image, and then generate keyword character sets based on the elements in the input image. More specifically, ControlNet-openpose, for example, can be used to analyze the hand posture and limb movements of the input image, and ControlNet-Canny, for example, can be used to analyze the object edge detection and boundary delineation of the input image, and then generate keyword character sets based on the analysis results.
在步驟S330中,在接收從修圖按鍵(具體可以是至少一修圖按鍵)中的任何一個所發送的修圖請求之後,基於與所述修圖請求相對應的編修指令集對所述關鍵詞字元組進行字串編修操作(具體可以是至少一字串編修操作),並產生編修字元組。步驟S330可以透過如圖2所示的處理模組220A的修圖請求接收單元223A和字串編修單元224A而被執行。在一些實施例中,步驟S330可以接續在步驟S320之後被執行。In step S330, after receiving a photo editing request sent from any one of the photo editing buttons (specifically, at least one photo editing button), a string editing operation (specifically, at least one string editing operation) is performed on the keyword character group based on the editing instruction set corresponding to the photo editing request, and an edited character group is generated. Step S330 can be executed by the photo editing request receiving unit 223A and the string editing unit 224A of the processing module 220A as shown in FIG2 . In some embodiments, step S330 can be executed after step S320.
在一些實施例中,所述修圖按鍵可以是預先被建立的,供使用者可以透過所述修圖按鍵中的任何一個來對所述輸入圖像進行操作(即對與所述輸入圖像相對應的關鍵詞字元組進行字串編修操作)。在一些實施例中,所述修圖按鍵的數量可以是複數個,藉以提供使用者可以在較佳的使用者體驗下來點選適合的修圖按鍵。在一些實施例中,可以透過編輯與所述修圖按鍵中的每一個相對應的標籤內容,藉以提供使用者可以更直觀地根據所述標籤內容來知悉各個修圖按鍵對所述關鍵詞字元組進行的字串編修操作後所產生的預期結果。In some embodiments, the photo editing buttons may be pre-established, so that the user can operate the input image through any of the photo editing buttons (i.e., perform string editing operations on the keyword character group corresponding to the input image). In some embodiments, the number of the photo editing buttons may be plural, so that the user can click a suitable photo editing button under a better user experience. In some embodiments, the label content corresponding to each of the photo editing buttons may be edited, so that the user can more intuitively know the expected results of the string editing operations performed on the keyword character group by each photo editing button according to the label content.
在一個具體的範例中,所述修圖按鍵可以包括第一修圖按鍵(標籤內容顯示為美式風格)、第二修圖按鍵(標籤內容顯示為日式風格)和第三修圖按鍵(標籤內容顯示為吉卜力風格),當使用者點選第三修圖按鍵時,第三修圖按鍵可以發出修圖請求,使得修圖請求接收單元223A在接收到由第三修圖按鍵所發送的修圖請求之後,字串編修單元224A可以基於與所述修圖請求相對應的編修指令集來對關鍵詞字元組進行字串編修操作(即在所述關鍵詞字元組的內容中新增“ghibli”的文字敘述)。In a specific example, the photo editing buttons may include a first photo editing button (the label content is displayed in American style), a second photo editing button (the label content is displayed in Japanese style) and a third photo editing button (the label content is displayed in Ghibli style). When the user clicks the third photo editing button, the third photo editing button can issue a photo editing request, so that after the photo editing request receiving unit 223A receives the photo editing request sent by the third photo editing button, the string editing unit 224A can perform a string editing operation on the keyword character group based on the editing instruction set corresponding to the photo editing request (i.e., add a text description of "ghibli" to the content of the keyword character group).
更具體地說,使用者可以透過點選第三修圖按鍵(標籤內容顯示為吉卜力風格)來切換最適合此風格的預訓練模型(base model)以及風格微調層(LORA layer),並經過測試後挑選出最適合的模型以及參數搭配進行組合,藉以產生與第三修圖按鍵相對應的指令集,並透過所述指令集來對關鍵詞字元組進行字串編修操作。More specifically, the user can click the third photo editing button (the label content shows Ghibli style) to switch the pre-trained model (base model) and style fine-tuning layer (LORA layer) that are most suitable for this style, and after testing, select the most suitable model and parameter combination to combine to generate an instruction set corresponding to the third photo editing button, and use the instruction set to perform string editing operations on keyword character groups.
藉此,透過步驟S330可以供使用者在不需要額外自行輸入指令組的情況下,就能夠進行字串編修操作,也就是說,所述步驟可以降低使用者的操作門檻及/或操作難度,進而使不熟悉指令組的使用者也能夠透過直接點選修圖按鍵的方式來對與所述輸入圖像相對應的關鍵詞字元組進行字串編修操作。Thus, through step S330, the user can perform string editing operations without having to input an additional command group. In other words, the step can lower the user's operation threshold and/or operation difficulty, so that users who are not familiar with the command group can also perform string editing operations on the keyword character group corresponding to the input image by directly clicking the image editing button.
在一些實施例中,步驟S330可以被執行不只一次,也就是說,處理模組220A可以執行一次或多次步驟S330。在一些實施例中,處理模組220A執行步驟S330的次數可以取決於使用者點選修圖按鍵的次數。具體而言,在使用者點選修圖按鍵中的任何一個使得處理模組220A執行步驟S330而產生編修字元組之後,使用者可以再點選修圖按鍵中的任何一個使得處理模組220A再次執行步驟S330而產生新的編修字元組(即將先前所產生的編修字元組當作是當下的關鍵詞字元組,進而對當下的關鍵詞字元組進行字串編修操作)。在一些實施例中,處理模組220A在接收到復原的請求之後,處理模組220A可以將當下的編修字元組復原成關鍵詞字元組或舊的編修字元組(舊的編修字元組可以是例如最近一次修改前的編修字元組,即在進行最近一次的字串編修操作之前的編修字元組)。在一些實施例中,當處理模組220A連續地接收到相同的修圖請求時,處理模組220A僅在首次接收到所述修圖請求之後對關鍵詞字元組或當下的編修字元組進行相對應的字串編修操作,即處理模組220A在接收到第二次、第三次、…、第N次所述修圖請求之後不再對當下的編修字元組進行相對應的字串編修操作。In some embodiments, step S330 may be executed more than once, that is, the processing module 220A may execute step S330 once or more times. In some embodiments, the number of times the processing module 220A executes step S330 may depend on the number of times the user clicks the photo editing buttons. Specifically, after the user clicks any one of the photo editing buttons to cause the processing module 220A to execute step S330 and generate an edit character group, the user may click any one of the photo editing buttons to cause the processing module 220A to execute step S330 again and generate a new edit character group (i.e., the previously generated edit character group is regarded as the current keyword character group, and a string editing operation is performed on the current keyword character group). In some embodiments, after receiving the restore request, the processing module 220A may restore the current edited character group to the keyword character group or the old edited character group (the old edited character group may be, for example, the edited character group before the most recent modification, i.e., the edited character group before the most recent string editing operation). In some embodiments, when the processing module 220A continuously receives the same image editing request, the processing module 220A only performs the corresponding string editing operation on the keyword character group or the current edited character group after the first receipt of the image editing request, i.e., the processing module 220A no longer performs the corresponding string editing operation on the current edited character group after receiving the second, third, ..., Nth image editing request.
在步驟S340中,基於所述編修字元組來產生人工智慧圖像。步驟S340可以透過如圖2所示的處理模組220A的圖像產生單元225A而被執行。在一些實施例中,步驟S340可以接續在步驟S330之後被執行。In step S340, an artificial intelligence image is generated based on the edited character set. Step S340 can be executed by the image generation unit 225A of the processing module 220A as shown in Figure 2. In some embodiments, step S340 can be executed after step S330.
藉此,透過如圖3所示的方法不僅可以基於接收到的輸入圖像來產生人工智慧圖像,還可以供使用者在不需要額外自行輸入指令組的情況下,就能夠產生所述人工智慧圖像。也就是說,本申請提供之用於產生圖像的計算機裝置可以降低使用者的操作門檻及/或操作難度,進而使不熟悉指令組的使用者也能夠透過所述計算機裝置來產生使用者所期許的圖像檔。Thus, the method shown in FIG3 can not only generate an artificial intelligence image based on the received input image, but also generate the artificial intelligence image without the user having to input an additional command set. In other words, the computer device for generating images provided by the present application can reduce the user's operating threshold and/or operating difficulty, thereby enabling users who are not familiar with the command set to generate the image file desired by the user through the computer device.
請參考圖4A,圖4A是說明本申請之一個範例之基於至少一輸入圖像來產生關鍵詞字元組(即步驟S320)的詳細流程圖。也就是說,圖3所示的步驟S320可以包括步驟S410A、S420A和S430A,並且可以透過執行步驟S410A、S420A和S430A來完成步驟S320,其中,步驟S410A、S420A和S430A可以透過如圖2所示的處理模組220A的字串產生單元222A和影像描述模型270而被執行。Please refer to FIG. 4A , which is a detailed flow chart of generating a keyword character set (i.e., step S320) based on at least one input image according to an example of the present application. That is, step S320 shown in FIG. 3 may include steps S410A, S420A, and S430A, and step S320 may be completed by executing steps S410A, S420A, and S430A, wherein steps S410A, S420A, and S430A may be executed by the string generation unit 222A and the image description model 270 of the processing module 220A as shown in FIG. 2 .
在步驟S410A中,將輸入圖像輸入至影像描述模型270。在一些實施例中,步驟S410A可以接續在步驟S310之後被執行。在一些實施例中,影像描述模型270可以是例如CLIP model或DeepBooru,但不限於此。In step S410A, the input image is input to the image description model 270. In some embodiments, step S410A may be performed after step S310. In some embodiments, the image description model 270 may be, for example, a CLIP model or DeepBooru, but is not limited thereto.
在步驟S420A中,透過影像描述模型270來產生關鍵詞字元組。在一些實施例中,步驟S420A可以接續在步驟S410A之後被執行。由於影像描述模型270已經事先透過複數筆資料(每一筆資料包括訓練圖像及其對應的訓練字串)進行訓練,因此已經完成訓練的影像描述模型270在識別所述輸入圖像之後,能夠基於所述輸入圖像而自動地並且更精準地產生與所述輸入圖像相對應的關鍵詞字元組。In step S420A, a keyword character set is generated through the image description model 270. In some embodiments, step S420A can be executed after step S410A. Since the image description model 270 has been trained in advance through a plurality of data (each data includes a training image and its corresponding training string), the trained image description model 270 can automatically and more accurately generate a keyword character set corresponding to the input image based on the input image after recognizing the input image.
在步驟S430A中,透過影像描述模型270來輸出關鍵詞字元組。在一些實施例中,步驟S430A可以接續在步驟S420A之後被執行。也就是說,透過執行步驟S430A,可以供處理模組220A進一步針對由影像描述模型270所產生並輸出的關鍵詞字元組進行後續的處理(即步驟S330)。In step S430A, the keyword character set is outputted through the image description model 270. In some embodiments, step S430A may be executed after step S420A. That is, by executing step S430A, the processing module 220A may further process the keyword character set generated and outputted by the image description model 270 (i.e., step S330).
在一些實施例中,步驟S420A和步驟S430A可以被整合成一個步驟,即透過影像描述模型270來產生並輸出關鍵詞字元組。In some embodiments, step S420A and step S430A may be integrated into one step, that is, generating and outputting keyword character sets through the image description model 270.
藉此,透過執行如圖4A所示的各個步驟,能夠基於所述輸入圖像而自動地並且更精準地產生與所述輸入圖像相對應的關鍵詞字元組。Thus, by executing the steps shown in FIG. 4A , a keyword character set corresponding to the input image can be automatically and more accurately generated based on the input image.
請參考圖4B,圖4B是說明本申請之一個範例之基於編修字元組來產生人工智慧圖像(即步驟S340)的詳細流程圖。也就是說,圖3所示的步驟S340可以包括步驟S410B、S420B和S430B,並且可以透過執行步驟S410B、S420B和S430B來完成步驟S340,其中,步驟S410B、S420B和S430B可以透過如圖2所示的處理模組220A的圖像產生單元225A和圖像產生模型280而被執行。Please refer to FIG. 4B , which is a detailed flowchart of an example of generating an artificial intelligence image based on editing a character set (i.e., step S340) of the present application. That is, step S340 shown in FIG. 3 may include steps S410B, S420B, and S430B, and step S340 may be completed by executing steps S410B, S420B, and S430B, wherein steps S410B, S420B, and S430B may be executed by the image generation unit 225A and the image generation model 280 of the processing module 220A as shown in FIG. 2 .
在步驟S410B中,將編修字元組輸入至圖像產生模型280。在一些實施例中,步驟S410B可以接續在步驟S330之後被執行。在一些實施例中,圖像產生模型280可以是本申請所屬技術領域中具有通常知識者所知悉的任何能夠基於字串來產生圖像的模型,例如Stable Diffusion,但不限於此。In step S410B, the edited character set is input to the image generation model 280. In some embodiments, step S410B may be executed after step S330. In some embodiments, the image generation model 280 may be any model known to a person of ordinary skill in the art to which the present application belongs that can generate an image based on a string, such as Stable Diffusion, but not limited thereto.
在步驟S420B中,透過圖像產生模型280來產生人工智慧圖像。在一些實施例中,步驟S420B可以接續在步驟S410B之後被執行。由於圖像產生模型280已經事先透過複數筆資料(每一筆資料包括訓練字串及其對應的訓練圖像)進行訓練,因此已經完成訓練的圖像產生模型280在識別所接收到的編修字元組之後,能夠基於所述編修字元組來產生與所述編修字元組相對應的人工智慧圖像。In step S420B, an artificial intelligence image is generated through the image generation model 280. In some embodiments, step S420B can be executed after step S410B. Since the image generation model 280 has been trained in advance through a plurality of data (each data includes a training string and its corresponding training image), the trained image generation model 280 can generate an artificial intelligence image corresponding to the edited character group based on the edited character group after recognizing the received edited character group.
在步驟S430B中,透過圖像產生模型280來輸出人工智慧圖像。在一些實施例中,步驟S430B可以接續在步驟S420B之後被執行。也就是說,透過執行步驟S430B,可以供計算機裝置200進一步針對由圖像產生模型280所產生並輸出的人工智慧圖像進行後續的處理(例如,將所述人工智慧圖像顯示在使用者的畫面上或將所述人工智慧圖像輸出至圖像輸出裝置,但不限於此)。In step S430B, the artificial intelligence image is outputted through the image generation model 280. In some embodiments, step S430B may be executed after step S420B. That is, by executing step S430B, the computer device 200 may further process the artificial intelligence image generated and outputted by the image generation model 280 (for example, displaying the artificial intelligence image on the user's screen or outputting the artificial intelligence image to an image output device, but not limited thereto).
在一些實施例中,步驟S420B和步驟S430B可以被整合成一個步驟,即透過圖像產生模型280來產生並輸出人工智慧圖像。In some embodiments, step S420B and step S430B may be integrated into one step, that is, generating and outputting an artificial intelligence image through the image generation model 280 .
藉此,透過執行如圖4B所示的各個步驟,能夠基於所述編修字元組而自動地並且更精準地產生與所述編修字元組相對應的人工智慧圖像。Thus, by executing the steps shown in FIG. 4B , an artificial intelligence image corresponding to the edited character set can be automatically and more accurately generated based on the edited character set.
請參考圖5,圖5是說明本申請之一個範例之在執行各個步驟之後所產生的結果的示意圖。更具體地說,圖5是說明處理模組220A透過執行如圖3所示的步驟S310、S320、S330和S340可以基於輸入圖像510來產生人工智慧圖像540,其中,各個步驟的詳細說明如下文所述。Please refer to FIG5, which is a schematic diagram illustrating the results generated after executing various steps of an example of the present application. More specifically, FIG5 illustrates that the processing module 220A can generate an artificial intelligence image 540 based on the input image 510 by executing steps S310, S320, S330 and S340 as shown in FIG3, wherein the detailed description of each step is described below.
首先,處理模組220A透過執行步驟S310可以經由圖像接收模組210從使用者裝置110及/或伺服器120接收輸入圖像510。其次,處理模組220A透過執行步驟S320可以基於接收到的輸入圖像510來產生關鍵詞字元組520。再者,處理模組220A透過執行步驟S330可以在接收到從如圖6所示的修圖按鍵630中的任何一個所發送的修圖請求之後,基於與所述修圖請求相對應的編修指令集對關鍵詞字元組520進行字串編修操作(即在關鍵詞字元組520的內容中新增“beagle dog”的文字敘述),並產生編修字元組530。最後,處理模組220A透過執行步驟S340可以基於編修字元組530來產生人工智慧圖像540。First, the processing module 220A can receive the input image 510 from the user device 110 and/or the server 120 via the image receiving module 210 by executing step S310. Secondly, the processing module 220A can generate the keyword character set 520 based on the received input image 510 by executing step S320. Furthermore, after receiving a photo editing request sent from any one of the photo editing buttons 630 shown in FIG. 6 , the processing module 220A can perform a string editing operation on the keyword character set 520 based on the editing instruction set corresponding to the photo editing request by executing step S330, and generate the edited character set 530. Finally, the processing module 220A can generate an artificial intelligence image 540 based on the edited character set 530 by executing step S340.
請參考圖6,圖6是說明本申請之一個實施例之在使用者裝置上的顯示畫面600的示意圖。使用者裝置上的顯示畫面600基本可以包括第一顯示區塊610和修圖按鍵630(具體可以是至少一修圖按鍵630)。在一些實施例中,使用者裝置上的顯示畫面600可以進一步包括第二顯示區塊620。應注意的是,圖6所示的各個元件的配置關係(例如,相對大小及/或相對位置)可以根據使用者的需求而進行調整,也就是說,圖6所示的配置關係僅是示意性說明的一個範例。Please refer to FIG. 6 , which is a schematic diagram of a display screen 600 on a user device for illustrating an embodiment of the present application. The display screen 600 on the user device may basically include a first display area 610 and a photo editing button 630 (specifically, at least one photo editing button 630). In some embodiments, the display screen 600 on the user device may further include a second display area 620. It should be noted that the configuration relationship (e.g., relative size and/or relative position) of each element shown in FIG. 6 can be adjusted according to the needs of the user, that is, the configuration relationship shown in FIG. 6 is only an example for schematic illustration.
修圖按鍵630供使用者可以透過點選修圖按鍵630中的任何一個來對輸入圖像510進行操作(即對與輸入圖像510相對應的關鍵詞字元組520進行字串編修操作)。在一些實施例中,修圖按鍵630可以是預先被建立的。在另一些實施例中,修圖按鍵630可以是根據關鍵詞字元組520的內容及使用者的歷史編修記錄中的至少一者而自動地被建立(如圖7所示的步驟S710)。在一些實施例中,修圖按鍵630的數量可以是複數個,藉以提供使用者可以在較佳的使用者體驗下來點選適合的修圖按鍵630。在一些實施例中,可以透過編輯至少一個修圖按鍵630相對應的標籤內容,藉以提供使用者可以更直觀地根據所述標籤內容來知悉各個修圖按鍵630對關鍵詞字元組520進行的字串編修操作後所產生的預期結果。The photo editing buttons 630 allow the user to operate the input image 510 by clicking any one of the photo editing buttons 630 (i.e., perform a string editing operation on the keyword character set 520 corresponding to the input image 510). In some embodiments, the photo editing buttons 630 may be pre-established. In other embodiments, the photo editing buttons 630 may be automatically established based on at least one of the content of the keyword character set 520 and the user's historical editing record (as shown in step S710 of FIG. 7 ). In some embodiments, the number of photo editing buttons 630 may be plural, so as to provide the user with a better user experience by clicking a suitable photo editing button 630. In some embodiments, by editing the label content corresponding to at least one photo editing button 630, the user can more intuitively understand the expected results of the string editing operation performed by each photo editing button 630 on the keyword character group 520 based on the label content.
在一些實施例中,第一顯示區塊610可以被配置成顯示人工智慧圖像540。更具體地說,使用者在點選修圖按鍵630中的任何一個之後,使用者可以透過第一顯示區塊610來知悉所產生的人工智慧圖像540。藉此,透過第一顯示區塊610的顯示內容,使用者可以評估當下的人工智慧圖像540是否符合使用者的預期,並決定是否需要透過修圖按鍵630來重新產生新的人工智慧圖像540。In some embodiments, the first display block 610 may be configured to display an artificial intelligence image 540. More specifically, after the user clicks any one of the photo editing buttons 630, the user may be aware of the generated artificial intelligence image 540 through the first display block 610. Thus, through the display content of the first display block 610, the user may evaluate whether the current artificial intelligence image 540 meets the user's expectations, and decide whether to regenerate a new artificial intelligence image 540 through the photo editing button 630.
在一些實施例中,第二顯示區塊620可以被配置成顯示關鍵詞字元組520和編修字元組530中的至少一者,也就是說,第二顯示區塊620可以先顯示關鍵詞字元組520,並且在進行字串編修操作之後,第二顯示區塊620可以顯示編修字元組530。在一些實施例中,第二顯示區塊620可以被省略,也就是說,使用者裝置上的顯示畫面600可以僅顯示第一顯示區塊610和修圖按鍵630,而沒有顯示第二顯示區塊620,藉此,可以提供使用者較佳的使用者體驗。In some embodiments, the second display block 620 may be configured to display at least one of the keyword character set 520 and the edit character set 530, that is, the second display block 620 may first display the keyword character set 520, and after the string editing operation is performed, the second display block 620 may display the edit character set 530. In some embodiments, the second display block 620 may be omitted, that is, the display screen 600 on the user device may only display the first display block 610 and the photo editing button 630, but not the second display block 620, thereby providing the user with a better user experience.
請參考圖7,圖7是說明本申請之第二實施例之用於產生圖像的方法的流程圖,更具體地說,圖7所示的方法能夠產生人工智慧圖像,而所述方法包括步驟S310、S320、S330、S340和S710。步驟S310、S320、S330和S340與圖3所示的步驟基本相同,也就是說,圖7所示的方法可以包括與圖3基本相同的步驟S310、S320、S330和S340,並進一步包括步驟S710。Please refer to FIG. 7, which is a flow chart of a method for generating an image according to the second embodiment of the present application. More specifically, the method shown in FIG. 7 can generate an artificial intelligence image, and the method includes steps S310, S320, S330, S340, and S710. Steps S310, S320, S330, and S340 are substantially the same as those shown in FIG. 3, that is, the method shown in FIG. 7 may include steps S310, S320, S330, and S340 substantially the same as those shown in FIG. 3, and further includes step S710.
在步驟S710中,根據關鍵詞字元組的內容及使用者的歷史編修記錄中的至少一者來自動地產生修圖按鍵(具體可以是至少一修圖按鍵)。步驟S710可以透過處理模組220A的按鍵產生單元(圖未示)而被執行。在一些實施例中,步驟S710可以接續在步驟S320之後被執行。在一些實施例中,可以透過文本分析模型而根據關鍵詞字元組的內容來自動地產生修圖按鍵。在一些實施例中,所述文本分析模型可以是例如BERTScore,但不限於此。In step S710, a photo editing button (specifically, at least one photo editing button) is automatically generated based on at least one of the content of the keyword character group and the user's historical editing record. Step S710 can be executed by a button generation unit (not shown) of the processing module 220A. In some embodiments, step S710 can be executed after step S320. In some embodiments, a photo editing button can be automatically generated based on the content of the keyword character group through a text analysis model. In some embodiments, the text analysis model can be, for example, BERTScore, but is not limited thereto.
所述BERTScore是一種能夠自動地評估兩個文本之間的語意相近程度的文本分析模型,因此可以透過所述BERTScore對關鍵詞字元組和其他的字元組計算其各自的分數,並且透過所述BERTScore提供與當下的關鍵詞字元組最接近的編修字元組,進而自動地產生能夠基於所述關鍵詞字元組來產生所述編修字元組的修圖按鍵。The BERTScore is a text analysis model that can automatically evaluate the semantic similarity between two texts. Therefore, the BERTScore can be used to calculate the scores of keyword character groups and other character groups, and the BERTScore can be used to provide an editing character group that is closest to the current keyword character group, thereby automatically generating a photo editing button that can generate the editing character group based on the keyword character group.
在一個具體的範例中,當關鍵詞字元組的內容為“cat, rating:safe, animal, no_humans, simple_background, realistic, animal_focus, looking_at_viewer, pubic_hair, black_background”時,由於透過所述BERTScore可以計算出吉卜力風格的分數為0.7、畢卡索風格的分數為0.2和油畫風格的分數為0.1,因此透過所述BERTScore可以自動地產生第四修圖按鍵(標籤內容顯示為吉卜力風格)、第五修圖按鍵(標籤內容顯示為畢卡索風格)和第六修圖按鍵(標籤內容顯示為油畫風格)。In a specific example, when the content of the keyword character group is "cat, rating:safe, animal, no_humans, simple_background, realistic, animal_focus, looking_at_viewer, pubic_hair, black_background", since the BERTScore can calculate that the score of Ghibli style is 0.7, the score of Picasso style is 0.2, and the score of oil painting style is 0.1, the BERTScore can automatically generate a fourth photo editing button (the label content is displayed as Ghibli style), a fifth photo editing button (the label content is displayed as Picasso style), and a sixth photo editing button (the label content is displayed as oil painting style).
在一些實施例中,可以根據儲存在編修資料庫260中的使用者的歷史編修記錄來自動地產生修圖按鍵。舉例來說,可以根據使用者的歷史編修記錄來計算使用者較常使用的字串編修操作,進而根據使用者較常使用的字串編修操作來產生修圖按鍵。In some embodiments, the photo editing buttons can be automatically generated based on the user's historical editing records stored in the editing database 260. For example, the user's frequently used string editing operations can be calculated based on the user's historical editing records, and then the photo editing buttons can be generated based on the user's frequently used string editing operations.
藉此,透過如圖7所示的步驟S710可以進一步根據使用者的使用情形而自動地產生適合使用者的修圖按鍵,供使用者可以更有效地點選適合的修圖按鍵來對與輸入圖像相對應的關鍵詞字元組進行字串編修操作,並產生人工智慧圖像。Thus, through step S710 as shown in FIG. 7 , suitable photo editing buttons can be automatically generated according to the user's usage situation, so that the user can more effectively click the suitable photo editing button to perform string editing operations on the keyword character group corresponding to the input image and generate an artificial intelligence image.
在一些實施例中,步驟S710也可以進一步根據編修字元組的內容來自動地產生修圖按鍵。也就是說,在執行完步驟S330之後,可以再次執行步驟S710,藉以透過步驟S710而根據編修字元組的內容及使用者的歷史編修記錄中的至少一者來自動地產生修圖按鍵。In some embodiments, step S710 may further automatically generate a retouching button according to the content of the edited character set. That is, after executing step S330, step S710 may be executed again, so that through step S710, a retouching button is automatically generated according to at least one of the content of the edited character set and the user's historical editing record.
請參考圖8,圖8是說明本申請之另一個範例之在執行各個步驟之後所產生的結果的示意圖。更具體地說,圖8是說明處理模組220A透過執行如圖3所示的步驟S310、S320、S330和S340可以基於輸入圖像510來產生人工智慧圖像840,詳細說明如下文所述。Please refer to Figure 8, which is a schematic diagram of another example of the present application after executing various steps. More specifically, Figure 8 illustrates that the processing module 220A can generate an artificial intelligence image 840 based on the input image 510 by executing steps S310, S320, S330 and S340 as shown in Figure 3, as described in detail below.
由於使用者可以分別點選各個修圖按鍵使得處理模組220A可以分別執行相對應的字串編修操作,因此,在一些實施例中,各個修圖按鍵可以具有各自的堆疊屬性值,使得處理模組220A在執行步驟S330時可以基於分別與修圖按鍵中的每一個相對應的堆疊屬性值來決定是否將字串編修操作進行堆疊。也就是說,所述堆疊屬性值可以被配置成決定各個字串編修操作彼此之間是否可以互相堆疊。由於處理模組220A可以基於堆疊屬性值來決定各個字串編修操作彼此之間是否可以互相堆疊,因此處理模組220A能夠將可堆疊的字串編修操作進行堆疊,藉以供使用者能夠產生更多樣化的人工智慧圖像。同時,不可堆疊的字串編修操作可以避免使用者點選相反效果的修圖按鍵,而產生超出使用者所預期的人工智慧圖像。Since the user can click on each image editing button to make the processing module 220A execute the corresponding string editing operation, in some embodiments, each image editing button can have its own stacking attribute value, so that the processing module 220A can determine whether to stack the string editing operation based on the stacking attribute value corresponding to each image editing button when executing step S330. In other words, the stacking attribute value can be configured to determine whether each string editing operation can be stacked with each other. Since the processing module 220A can determine whether each string editing operation can be stacked with each other based on the stacking attribute value, the processing module 220A can stack the stackable string editing operations, so that the user can generate more diverse artificial intelligence images. At the same time, the non-stackable string editing operations can prevent the user from clicking the editing button with the opposite effect and generating an artificial intelligence image beyond the user's expectation.
請同時參考圖5和圖8,處理模組220A透過執行步驟S330可以在接收到從如圖6所示的修圖按鍵630中的任何一個所發送的修圖請求之後,基於與所述修圖請求相對應的編修指令集對關鍵詞字元組520進行字串編修操作(即在關鍵詞字元組520的內容中新增“beagle dog”的文字敘述),並產生如圖5所示的編修字元組530。若使用者再點選修圖按鍵630中的任何一個堆疊屬性值被設定為可堆疊的修圖按鍵630,則處理模組220A透過再次執行步驟S330可以在接收到堆疊屬性值被設定為可堆疊的修圖按鍵630所發送的修圖請求之後,基於與所述修圖請求相對應的編修指令集對如圖5所示的編修字元組530進行字串編修操作(即在如圖5所示的編修字元組530的內容中新增“ghibli”的文字敘述),並產生如圖8所示的編修字元組830。也就是說,圖8所示的編修字元組830是因應使用者點選兩個堆疊屬性值被設定為可堆疊的修圖按鍵之後所產生的。最後,處理模組220A透過執行步驟S340可以基於編修字元組830來產生人工智慧圖像840,藉此能夠產生更多樣化的人工智慧圖像。Please refer to Figures 5 and 8 at the same time. After receiving a photo editing request sent from any one of the photo editing buttons 630 shown in Figure 6, the processing module 220A can perform a string editing operation on the keyword character group 520 based on the editing instruction set corresponding to the photo editing request (i.e., adding a text description of "beagle dog" to the content of the keyword character group 520) by executing step S330, and generate an edited character group 530 as shown in Figure 5. If the user clicks on any of the retouching buttons 630 whose stacked attribute value is set as stackable, the processing module 220A can perform a string editing operation on the editing character set 530 shown in FIG5 (i.e., add the text description "ghibli" to the content of the editing character set 530 shown in FIG5 ) based on the editing instruction set corresponding to the retouching request after receiving the retouching request sent by the retouching button 630 whose stacked attribute value is set as stackable by executing step S330 again, and generate the editing character set 830 shown in FIG8 . That is, the editing character set 830 shown in FIG8 is generated in response to the user clicking on two retouching buttons whose stacked attribute values are set as stackable. Finally, the processing module 220A can generate an artificial intelligence image 840 based on the edited character set 830 by executing step S340, thereby generating more diverse artificial intelligence images.
在一些實施例中,各個堆疊屬性值可以分別被設定為可堆疊的或不可堆疊的,藉此可以更簡便地判斷各個字串編修操作彼此之間是否可以互相堆疊。在另一些實施例中,彼此可以互相堆疊的堆疊屬性值可以被設定為相同的群組代碼,藉以透過群組代碼的分類方式來決定各個字串編修操作彼此之間是否可以互相堆疊(例如,相同的群組代碼表示各個字串編修操作彼此之間是可堆疊的,而不同的群組代碼表示各個字串編修操作彼此之間是不可堆疊的),藉此可以更縝密地判斷各個字串編修操作彼此之間是否可以互相堆疊。In some embodiments, each stacking attribute value can be set as stackable or non-stackable, thereby making it easier to determine whether each string editing operation can be stacked with each other. In other embodiments, the stacking attribute values that can be stacked with each other can be set to the same group code, thereby determining whether each string editing operation can be stacked with each other through the classification of the group code (for example, the same group code indicates that each string editing operation is stackable with each other, while different group codes indicate that each string editing operation is non-stackable with each other), thereby making it more accurate to determine whether each string editing operation can be stacked with each other.
在一些實施例中,堆疊屬性值可以根據使用者的偏好設定及預設值中的至少一者而被設定。在一個具體的範例中,堆疊屬性值可以先根據預設值而被設定,例如,第七修圖按鍵(標籤內容顯示為吉卜力風格)和第八修圖按鍵(標籤內容顯示為美式風格)被設定為可堆疊的,而第九修圖按鍵(標籤內容顯示為日式風格)和第十修圖按鍵(標籤內容顯示為寫實風格)被設定為不可堆疊的,之後可以再根據使用者的偏好設定來調整各個修圖按鍵的堆疊屬性值,例如,將第九修圖按鍵(標籤內容顯示為日式風格)調整設定為可堆疊的。In some embodiments, the stacking attribute value may be set according to at least one of a user's preference setting and a default value. In a specific example, the stacking attribute value may be set according to the default value first, for example, the seventh photo editing button (label content displayed as Ghibli style) and the eighth photo editing button (label content displayed as American style) are set as stackable, while the ninth photo editing button (label content displayed as Japanese style) and the tenth photo editing button (label content displayed as realistic style) are set as non-stackable, and then the stacking attribute value of each photo editing button may be adjusted according to the user's preference setting, for example, the ninth photo editing button (label content displayed as Japanese style) is adjusted to be stackable.
請參考圖9,圖9是說明本申請之又一個範例之在執行各個步驟之後所產生的結果的示意圖。更具體地說,圖9是說明處理模組220A透過執行如圖3所示的步驟S310、S320、S330和S340可以基於輸入圖像510來產生人工智慧圖像940,詳細說明如下文所述。Please refer to Figure 9, which is a schematic diagram of another example of the present application after executing various steps. More specifically, Figure 9 illustrates that the processing module 220A can generate an artificial intelligence image 940 based on the input image 510 by executing steps S310, S320, S330 and S340 as shown in Figure 3, as described in detail below.
在一些實施例中,各個修圖按鍵可以具有各自的編修權重值,使得處理模組220A在執行步驟S330時可以基於分別與修圖按鍵中的每一個相對應的編修權重值來決定字串編修操作對關鍵詞字元組的編修程度。在一些實施例中,編修權重值可以被設定為介於0至1之間的值,並且可以採用中間值(例如,0.5)作為基準點。舉例來說,當編修權重值被設定為0.5時,表示字串編修操作對關鍵詞字元組的編修程度為基本設定;當編修權重值被設定為大於0.5時,表示字串編修操作對關鍵詞字元組的編修程度為加強設定(即編修程度高於基本設定);而當編修權重值被設定為小於0.5時,表示字串編修操作對關鍵詞字元組的編修程度為減弱設定(即編修程度低於基本設定)。In some embodiments, each of the image editing buttons may have its own editing weight value, so that the processing module 220A may determine the editing degree of the keyword character group by the string editing operation based on the editing weight value corresponding to each of the image editing buttons when executing step S330. In some embodiments, the editing weight value may be set to a value between 0 and 1, and a middle value (e.g., 0.5) may be used as a reference point. For example, when the editing weight value is set to 0.5, it means that the editing degree of the keyword character group by the string editing operation is the basic setting; when the editing weight value is set to greater than 0.5, it means that the editing degree of the keyword character group by the string editing operation is the enhanced setting (that is, the editing degree is higher than the basic setting); and when the editing weight value is set to less than 0.5, it means that the editing degree of the keyword character group by the string editing operation is the weakened setting (that is, the editing degree is lower than the basic setting).
在一個具體的範例中,第十一修圖按鍵(標籤內容顯示為吉卜力風格)的編修權重值可以被設定為大於0.5的值(例如,0.7),而第十二修圖按鍵(標籤內容顯示為比格犬)的編修權重值可以被設定為0.5的值。當使用者點選第十一修圖按鍵時,處理模組220A在執行步驟S330時可以以高於基本設定的編修程度來對關鍵詞字元組進行字串編修操作,藉以使第十一修圖按鍵的編修程度更為顯著。In a specific example, the editing weight value of the eleventh retouching button (label content displayed as Ghibli style) can be set to a value greater than 0.5 (e.g., 0.7), and the editing weight value of the twelfth retouching button (label content displayed as beagle) can be set to a value of 0.5. When the user clicks the eleventh retouching button, the processing module 220A can perform a string editing operation on the keyword character group at a higher editing level than the basic setting when executing step S330, so that the editing level of the eleventh retouching button is more significant.
請同時參考圖8和圖9,圖8所示的編修字元組830是關鍵詞字元組520可以因應使用者點選兩個堆疊屬性值被設定為可堆疊的修圖按鍵(例如,先前所述的第十一修圖按鍵和第十二修圖按鍵)而被產生;而圖9所示的編修字元組930是關鍵詞字元組520可以進一步因應第十一修圖按鍵的編修權重值和第十二修圖按鍵的編修權重值而被產生。更具體地說,由於第十一修圖按鍵(標籤內容顯示為吉卜力風格)的編修權重值可以被設定為大於0.5的值(例如,0.7),因此如圖9所示的編修字元組930的內容為“ghibli++”。最後,處理模組220A透過執行步驟S340可以基於編修字元組930來產生人工智慧圖像940,藉此能夠產生更多樣化的人工智慧圖像。Please refer to FIG8 and FIG9 at the same time. The edit character set 830 shown in FIG8 is that the keyword character set 520 can be generated in response to the user clicking two stacking attribute values set as stackable editing buttons (for example, the eleventh editing button and the twelfth editing button mentioned above); and the edit character set 930 shown in FIG9 is that the keyword character set 520 can be further generated in response to the editing weight value of the eleventh editing button and the editing weight value of the twelfth editing button. More specifically, since the editing weight value of the eleventh editing button (the label content is displayed as Ghibli style) can be set to a value greater than 0.5 (for example, 0.7), the content of the edit character set 930 shown in FIG9 is "ghibli++". Finally, the processing module 220A can generate an artificial intelligence image 940 based on the edited character set 930 by executing step S340, thereby generating more diverse artificial intelligence images.
在一些實施例中,編修權重值可以是根據關鍵詞字元組的內容、使用者的歷史編修記錄及預設值中的至少一者而被設定。在一個具體的範例中,編修權重值可以根據預設值而被設定,例如,第十一修圖按鍵(標籤內容顯示為吉卜力風格)的編修權重值可以被設定為大於0.5的值(例如,0.7),而第十二修圖按鍵(標籤內容顯示為比格犬)的編修權重值可以被設定為0.5的值。在一個具體的範例中,編修權重值可以根據使用者的歷史編修記錄而被設定,例如,將使用者較常使用的修圖按鍵所對應的編修權重值設定為大於0.5的值。在一個具體的範例中,編修權重值可以根據關鍵詞字元組的內容而自動地被設定,例如,可以透過如先前所述的BERTScore來設定各個編修權重值。In some embodiments, the editing weight value may be set according to at least one of the content of the keyword character group, the user's historical editing record, and a default value. In a specific example, the editing weight value may be set according to the default value, for example, the editing weight value of the eleventh editing button (label content displayed as Ghibli style) may be set to a value greater than 0.5 (e.g., 0.7), and the editing weight value of the twelfth editing button (label content displayed as beagle) may be set to a value of 0.5. In a specific example, the editing weight value may be set according to the user's historical editing record, for example, the editing weight value corresponding to the editing button that the user uses more frequently is set to a value greater than 0.5. In a specific example, the edit weight value can be automatically set based on the content of the keyword character group. For example, each edit weight value can be set by BERTScore as described previously.
請參考圖10,圖10是說明本申請之第三實施例之用於產生圖像的方法的流程圖,更具體地說,圖10所示的方法能夠產生人工智慧圖像,而所述方法包括步驟S310、S320、S330、S340、S1010和S1020。步驟S310、S320、S330和S340與圖3所示的步驟基本相同,也就是說,圖10所示的方法可以包括與圖3基本相同的步驟S310、S320、S330和S340,並進一步包括步驟S1010和S1020。Please refer to FIG. 10 , which is a flow chart of a method for generating an image according to the third embodiment of the present application. More specifically, the method shown in FIG. 10 can generate an artificial intelligence image, and the method includes steps S310, S320, S330, S340, S1010, and S1020. Steps S310, S320, S330, and S340 are substantially the same as those shown in FIG. 3 , that is, the method shown in FIG. 10 may include steps S310, S320, S330, and S340 substantially the same as those shown in FIG. 3 , and further include steps S1010 and S1020.
在步驟S1010中,接收與物件的實體大小相對應的物件資訊。步驟S1010可以透過處理模組220A的物件資訊接收單元(圖未示)而被執行。所述物件資訊可以包括諸如長、寬和高的物件尺寸。更具體地說,當物件的實體大小的長和寬分別為第一尺寸和第二尺寸時,所述物件資訊接收單元可以接收第一尺寸和第二尺寸的物件資訊。在一個具體的範例中,當物件(例如,手機殼)的實體大小為151×75mm時,所述物件資訊接收單元可以接收151×75mm的物件資訊。In step S1010, object information corresponding to the physical size of the object is received. Step S1010 can be executed by the object information receiving unit (not shown) of the processing module 220A. The object information may include object dimensions such as length, width, and height. More specifically, when the length and width of the physical size of the object are a first size and a second size, respectively, the object information receiving unit can receive object information of the first size and the second size. In a specific example, when the physical size of an object (e.g., a mobile phone case) is 151×75 mm, the object information receiving unit can receive object information of 151×75 mm.
在步驟S1020中,基於所述物件資訊來決定人工智慧圖像的尺寸。步驟S1020可以透過處理模組220A的尺寸決定單元(圖未示)而被執行。在一些實施例中,步驟S1020可以接續在步驟S1010之後被執行。所述人工智慧圖像的尺寸可以包括諸如長和寬的尺寸。更具體地說,當透過執行步驟S1010所接收到的物件資訊分別為第一尺寸和第二尺寸時,處理模組220A可以透過執行步驟S1020來決定人工智慧圖像的尺寸分別為第一尺寸和第二尺寸。在一些實施例中,所述人工智慧圖像的尺寸可以是以像素(pixel)為單位。在一個具體的範例中,當所接收到的物件資訊為151×75mm時,處理模組220A可以透過執行步驟S1020來決定人工智慧圖像的尺寸為2000 pixels × 2000 pixels。也就是說,人工智慧圖像的圖像輪廓外框可以被設定為2000 pixels × 2000 pixels。In step S1020, the size of the artificial intelligence image is determined based on the object information. Step S1020 can be executed by a size determination unit (not shown) of the processing module 220A. In some embodiments, step S1020 can be executed after step S1010. The size of the artificial intelligence image may include dimensions such as length and width. More specifically, when the object information received by executing step S1010 is a first size and a second size, respectively, the processing module 220A can determine the size of the artificial intelligence image to be the first size and the second size by executing step S1020. In some embodiments, the size of the artificial intelligence image can be in pixels. In a specific example, when the received object information is 151×75 mm, the processing module 220A can determine the size of the artificial intelligence image to be 2000 pixels×2000 pixels by executing step S1020. That is, the image outline frame of the artificial intelligence image can be set to 2000 pixels×2000 pixels.
在一些實施例中,如圖6所示的第一顯示區塊610的大小比例可以取決於物件的實體大小。更具體地說,處理模組220A在執行完步驟S1010和步驟S1020後,可以自動地將第一顯示區塊610的大小比例進行調整。In some embodiments, the size ratio of the first display area 610 as shown in Figure 6 may depend on the physical size of the object. More specifically, after executing step S1010 and step S1020, the processing module 220A may automatically adjust the size ratio of the first display area 610.
藉此,透過如圖10所示的步驟S1010和S1020可以進一步根據物件的實體大小來設定人工智慧圖像的尺寸,供使用者可以直接地產生與物件的實體大小基本相同的人工智慧圖像,藉以直觀地評估所產生的人工智慧圖像是否符合使用者的預期(包括人工智慧圖像中的各個元素的大小、位置和占比等)。Thus, through steps S1010 and S1020 as shown in FIG. 10 , the size of the AI image can be further set according to the physical size of the object, so that the user can directly generate an AI image that is substantially the same size as the physical size of the object, thereby intuitively evaluating whether the generated AI image meets the user's expectations (including the size, position, and proportion of each element in the AI image).
請參考圖11,圖11是說明本申請之另一個範例之基於至少一輸入圖像來產生關鍵詞字元組(即步驟S320)的詳細流程圖。也就是說,圖3所示的步驟S320可以包括步驟S1110、S1120、S1130、S1140和S1150,並且可以透過執行步驟S1110、S1120、S1130、S1140和S1150來完成步驟S320,其中,步驟S1110、S1120、S1130、S1140和S1150可以透過如圖2所示的處理模組220A而被執行。Please refer to FIG11, which is a detailed flowchart of another example of the present application for generating a keyword character set based on at least one input image (i.e., step S320). That is, step S320 shown in FIG3 may include steps S1110, S1120, S1130, S1140, and S1150, and step S320 may be completed by executing steps S1110, S1120, S1130, S1140, and S1150, wherein steps S1110, S1120, S1130, S1140, and S1150 may be executed by the processing module 220A shown in FIG2.
在步驟S1110中,接收裝飾圖像(具體可以是至少一裝飾圖像)。步驟S1110可以透過如圖2所示的處理模組220A的圖像接收單元221A而被執行。在一些實施例中,步驟S1110可以接續在步驟S310之後被執行。在一些實施例中,步驟S1110和步驟S310可以同時地被執行。在一些實施例中,步驟S1110可以類似於圖3所示的步驟S310,而步驟S1110與圖3所示的步驟S310的差異之處在於,透過執行步驟S1110所接收到的是裝飾圖像。在一些實施例中,所述裝飾圖像可以是指包括單一個物件並且已經去背完成的圖像,例如,籃球、麥克風、娃娃、項鍊、帽子或太陽眼鏡等。In step S1110, a decorative image (specifically, at least one decorative image) is received. Step S1110 may be executed by the image receiving unit 221A of the processing module 220A as shown in FIG2 . In some embodiments, step S1110 may be executed after step S310. In some embodiments, step S1110 and step S310 may be executed simultaneously. In some embodiments, step S1110 may be similar to step S310 shown in FIG3 , and the difference between step S1110 and step S310 shown in FIG3 is that what is received by executing step S1110 is a decorative image. In some embodiments, the decorative image may refer to an image that includes a single object and has been background-removed, such as a basketball, a microphone, a doll, a necklace, a hat, or sunglasses.
在步驟S1120中,將輸入圖像和裝飾圖像進行構圖操作(具體可以是至少一構圖操作),並產生編排圖像。步驟S1120可以透過處理模組220A的圖像編排單元(圖未示)而被執行。在一些實施例中,步驟S1120可以接續在步驟S1110之後被執行。所述構圖操作包括調整輸入圖像及/或裝飾圖像的大小及/或位置,但不限於此。在一些實施例中,使用者可以以手動操作的方式來透過處理模組220A對輸入圖像和裝飾圖像進行構圖操作,在一個具體的例子中,使用者可以操作智慧型手機並以手指點選輸入圖像和裝飾圖像以進行移動、翻轉、縮放等構圖操作,在另一個具體的例子中,使用者也可以操作個人電腦並使用滑鼠來對輸入圖像和裝飾圖像進行構圖操作,藉此可以供使用者能夠產生更符合使用者預期的編排圖像。在另一些實施例中,處理模組220A可以自動地對輸入圖像和裝飾圖像進行構圖操作,藉此可以更快速地產生編排圖像,並提供使用者一種可能的編排圖像。更具體地說,處理模組220A可以先分析與所述輸入圖像相對應的字串內容和與所述裝飾圖像相對應的字串內容,再對所述字串內容的內容進行調整以自動地產生新的字串內容,並基於所述新的字串內容來產生與所述新的字串內容相對應的圖像(即編排圖像)。In step S1120, the input image and the decorative image are subjected to a composition operation (specifically, at least one composition operation) and an arrangement image is generated. Step S1120 may be executed by an image arrangement unit (not shown) of the processing module 220A. In some embodiments, step S1120 may be executed after step S1110. The composition operation includes adjusting the size and/or position of the input image and/or the decorative image, but is not limited thereto. In some embodiments, the user can manually operate the processing module 220A to perform composition operations on the input image and the decorative image. In a specific example, the user can operate a smart phone and use a finger to select the input image and the decorative image to perform composition operations such as moving, flipping, and scaling. In another specific example, the user can also operate a personal computer and use a mouse to perform composition operations on the input image and the decorative image, thereby allowing the user to generate an arrangement image that better meets the user's expectations. In other embodiments, the processing module 220A can automatically perform composition operations on the input image and the decorative image, thereby generating an arrangement image more quickly and providing the user with a possible arrangement image. More specifically, the processing module 220A can first analyze the string content corresponding to the input image and the string content corresponding to the decorative image, then adjust the content of the string content to automatically generate new string content, and generate an image corresponding to the new string content (i.e., a layout image) based on the new string content.
在步驟S1130中,將編排圖像輸入至影像描述模型270。步驟S1130可以透過如圖2所示的處理模組220A的字串產生單元222A和影像描述模型270而被執行。在一些實施例中,步驟S1130可以接續在步驟S1120之後被執行。在一些實施例中,步驟S1130可以類似於圖4A所示的步驟S410A,而步驟S1130與圖4A所示的步驟S410A的差異之處在於,透過執行步驟S1130是將所產生的編排圖像進行輸入。In step S1130, the layout image is input to the image description model 270. Step S1130 can be executed through the string generation unit 222A and the image description model 270 of the processing module 220A as shown in FIG2. In some embodiments, step S1130 can be executed after step S1120. In some embodiments, step S1130 can be similar to step S410A shown in FIG4A, and the difference between step S1130 and step S410A shown in FIG4A is that the generated layout image is input by executing step S1130.
在步驟S1140中,透過影像描述模型270來產生關鍵詞字元組。步驟S1140可以透過如圖2所示的處理模組220A的字串產生單元222A和影像描述模型270而被執行。在一些實施例中,步驟S1140可以接續在步驟S1130之後被執行。在一些實施例中,步驟S1140可以與圖4A所示的步驟S420A基本相同。In step S1140, a keyword character set is generated through the image description model 270. Step S1140 can be executed through the string generation unit 222A and the image description model 270 of the processing module 220A as shown in FIG2. In some embodiments, step S1140 can be executed after step S1130. In some embodiments, step S1140 can be substantially the same as step S420A shown in FIG4A.
在步驟S1150中,透過影像描述模型270來輸出關鍵詞字元組。步驟S1150可以透過如圖2所示的處理模組220A的字串產生單元222A和影像描述模型270而被執行。在一些實施例中,步驟S1150可以接續在步驟S1140之後被執行。在一些實施例中,步驟S1150可以與圖4A所示的步驟S430A基本相同。In step S1150, the keyword character set is outputted through the image description model 270. Step S1150 may be executed through the string generation unit 222A and the image description model 270 of the processing module 220A as shown in FIG2 . In some embodiments, step S1150 may be executed after step S1140. In some embodiments, step S1150 may be substantially the same as step S430A shown in FIG4A .
在一些實施例中,步驟S1140和步驟S1150可以被整合成一個步驟,即透過影像描述模型270來產生並輸出關鍵詞字元組。In some embodiments, step S1140 and step S1150 may be integrated into one step, that is, generating and outputting keyword character groups through the image description model 270.
藉此,透過執行如圖11所示的各個步驟,使用者在不需要額外自行輸入指令組的情況下,就能夠產生更多樣化的人工智慧圖像。Thus, by executing the steps shown in FIG. 11 , the user can generate more diverse artificial intelligence images without having to input additional command sets.
請參考圖12,圖12是說明本申請之第四實施例之用於產生圖像的方法的流程圖,更具體地說,圖12所示的方法能夠產生人工智慧圖像,而所述方法包括步驟S310、S320、S330、S340和S1210。步驟S310、S320、S330和S340與圖3所示的步驟基本相同,也就是說,圖12所示的方法可以包括與圖3基本相同的步驟S310、S320、S330和S340,並進一步包括步驟S1210。Please refer to FIG. 12, which is a flow chart of a method for generating an image according to the fourth embodiment of the present application. More specifically, the method shown in FIG. 12 can generate an artificial intelligence image, and the method includes steps S310, S320, S330, S340 and S1210. Steps S310, S320, S330 and S340 are substantially the same as the steps shown in FIG. 3, that is, the method shown in FIG. 12 may include steps S310, S320, S330 and S340 substantially the same as those in FIG. 3, and further includes step S1210.
在步驟S1210中,將人工智慧圖像傳輸至圖像輸出裝置,並透過圖像輸出裝置來將人工智慧圖像進行實體輸出。步驟S1210可以透過如圖2所示的處理模組220A的圖像輸出單元226A而被執行。在一些實施例中,步驟S1210可以接續在步驟S340之後被執行。更具體地說,處理模組220A在執行步驟S1210時,可以經由圖像輸出模組240而將人工智慧圖像輸出至圖像輸出裝置310,使得圖像輸出裝置310能夠將所述人工智慧圖像印製在實體的物件上。In step S1210, the artificial intelligence image is transmitted to the image output device, and the artificial intelligence image is physically outputted through the image output device. Step S1210 can be executed through the image output unit 226A of the processing module 220A as shown in FIG. 2 . In some embodiments, step S1210 can be executed after step S340. More specifically, when the processing module 220A executes step S1210, the artificial intelligence image can be output to the image output device 310 through the image output module 240, so that the image output device 310 can print the artificial intelligence image on a physical object.
藉此,透過如圖12所示的步驟S1210可以進一步將由處理模組220A所產生的人工智慧圖像輸出並印製在實體的物件上,使得所述人工智慧圖像可以更廣泛地被應用在諸如成衣印刷、汽車烤漆印刷、手機殼印刷或貼紙等的各種實體印製需求的領域。Thus, through step S1210 as shown in FIG. 12 , the artificial intelligence image generated by the processing module 220A can be further output and printed on a physical object, so that the artificial intelligence image can be more widely used in various fields of physical printing needs such as garment printing, automobile paint printing, mobile phone case printing or stickers.
請參考圖13,圖13是說明本申請之另一個實施例之處理模組220B的方塊示意圖。處理模組220B的配置可以與圖1所示的處理模組220基本相同,其不同之處在於,處理模組220B可以是專門被配置成基於所述輸入圖像來產生風格轉換圖像,而基於所述輸入圖像來產生人工智慧圖像的功能與配置可以被省略。Please refer to FIG. 13, which is a block diagram of a processing module 220B of another embodiment of the present application. The configuration of the processing module 220B may be substantially the same as the processing module 220 shown in FIG. 1, except that the processing module 220B may be specifically configured to generate a style conversion image based on the input image, and the function and configuration of generating an artificial intelligence image based on the input image may be omitted.
處理模組220B基本可以被配置成包括圖像接收單元221B、風格轉換請求接收單元223B、風格轉換單元224B和圖像輸出單元226B。在一些實施例中,處理模組220B可以進一步包括風格轉換模型290,即風格轉換模型290可以根據使用者的需求而選擇性地被配置於處理模組220B中。在一些實施例中,風格轉換模型290的功能與配置可以被整合進風格轉換單元224B中。The processing module 220B can be basically configured to include an image receiving unit 221B, a style conversion request receiving unit 223B, a style conversion unit 224B and an image output unit 226B. In some embodiments, the processing module 220B can further include a style conversion model 290, that is, the style conversion model 290 can be selectively configured in the processing module 220B according to the needs of the user. In some embodiments, the function and configuration of the style conversion model 290 can be integrated into the style conversion unit 224B.
圖像接收單元221B可以被配置成適用於接收輸入圖像(具體可以是至少一輸入圖像)。在一些實施例中,圖像接收單元221B的功能與配置可以與圖2所示的圖像接收單元221A基本相同。The image receiving unit 221B may be configured to be suitable for receiving an input image (specifically, at least one input image). In some embodiments, the function and configuration of the image receiving unit 221B may be substantially the same as the image receiving unit 221A shown in FIG. 2 .
風格轉換請求接收單元223B可以被配置成接收由風格轉換按鍵(如圖16所示的風格轉換按鍵1630)所發送的風格轉換請求,也就是說,當使用者點選至少一風格轉換按鍵中的任何一個時,被點選的風格轉換按鍵將會發送對應的風格轉換請求,使得風格轉換請求接收單元223B可以接收與被點選的風格轉換按鍵相對應的風格轉換請求。The style conversion request receiving unit 223B can be configured to receive a style conversion request sent by a style conversion button (such as the style conversion button 1630 shown in Figure 16), that is, when the user clicks any one of at least one style conversion button, the clicked style conversion button will send a corresponding style conversion request, so that the style conversion request receiving unit 223B can receive the style conversion request corresponding to the clicked style conversion button.
風格轉換單元224B可以被配置成將輸入圖像輸入至風格轉換模型290,並透過風格轉換模型290來產生和輸出風格轉換圖像。在一些實施例中,風格轉換模型290可以是例如AdaAttN(Adaptive Attention Normalization),但不限於此。所述AdaAttN可以從學習圖像學習淺層和深層的特徵,並計算各個點的加權統計量以及正規化,使得分析結果能夠表現出相同的局部特徵統計量。此外,所述AdaAttN導出新的局部特徵損失,藉以增強局部視覺質量。藉此,所述AdaAttN能夠將學習圖像進行一系列的計算與分析,並基於從學習圖像所學習到的特徵將輸入圖像轉換成具有所述特徵的風格轉換圖像。The style conversion unit 224B may be configured to input an input image into a style conversion model 290, and to generate and output a style conversion image through the style conversion model 290. In some embodiments, the style conversion model 290 may be, for example, AdaAttN (Adaptive Attention Normalization), but is not limited thereto. The AdaAttN may learn shallow and deep features from a learning image, and calculate weighted statistics and normalization of each point so that the analysis results can show the same local feature statistics. In addition, the AdaAttN derives a new local feature loss to enhance local visual quality. Thereby, the AdaAttN is able to perform a series of calculations and analyses on the learning image, and based on the features learned from the learning image, convert the input image into a style-converted image having the features.
在一些實施例中,風格轉換模型290可以基於學習圖像(具體可以是至少一學習圖像)的風格學習結果來對輸入圖像執行風格轉換操作,並產生風格轉換圖像。更具體地說,透過事先將所述學習圖像輸入至風格轉換模型290,使得風格轉換模型290能夠仿效所述學習圖像的特徵,藉此風格轉換模型290能夠基於所述學習圖像的特徵將所述輸入圖像轉換成具有所述學習圖像的特徵的風格轉換圖像。In some embodiments, the style conversion model 290 can perform a style conversion operation on the input image based on the style learning result of the learning image (specifically, at least one learning image) and generate a style conversion image. More specifically, by inputting the learning image into the style conversion model 290 in advance, the style conversion model 290 can imitate the characteristics of the learning image, thereby the style conversion model 290 can convert the input image into a style conversion image having the characteristics of the learning image based on the characteristics of the learning image.
在一些實施例中,學習圖像可以是如上所述的用於產生圖像的方法中的任何一種方法所產生的人工智慧圖像。藉此,風格轉換模型290可以效仿所述人工智慧圖像的特徵,使得風格轉換模型290能夠基於所述人工智慧圖像的特徵將所述輸入圖像轉換成具有所述人工智慧圖像的特徵的風格轉換圖像。In some embodiments, the learning image may be an artificial intelligence image generated by any of the methods for generating images described above. Thus, the style conversion model 290 may imitate the characteristics of the artificial intelligence image, so that the style conversion model 290 can convert the input image into a style conversion image having the characteristics of the artificial intelligence image based on the characteristics of the artificial intelligence image.
圖像輸出單元226B可以被配置成將由風格轉換單元224B所產生的風格轉換圖像進行輸出。在一些實施例中,圖像輸出單元226B可以將所述風格轉換圖像以特定的檔案格式儲存在圖像資料庫250中。在一些實施例中,圖像輸出單元226B可以將所述風格轉換圖像以特定的檔案格式並透過圖像輸出模組240輸出至圖像輸出裝置310及/或伺服器320。The image output unit 226B may be configured to output the style conversion image generated by the style conversion unit 224B. In some embodiments, the image output unit 226B may store the style conversion image in a specific file format in the image database 250. In some embodiments, the image output unit 226B may output the style conversion image in a specific file format to the image output device 310 and/or the server 320 through the image output module 240.
藉此,透過如圖13所示的處理模組220B不僅可以基於接收到的輸入圖像來產生風格轉換圖像,還可以供使用者在不需要額外自行輸入指令組的情況下,就能夠產生所述風格轉換圖像。也就是說,本申請提供之用於產生圖像的計算機裝置可以降低使用者的操作門檻及/或操作難度,進而使不熟悉指令組的使用者也能夠透過所述計算機裝置來產生使用者所期許的圖像檔。Thus, the processing module 220B shown in FIG13 can not only generate a style conversion image based on the received input image, but also generate the style conversion image without the user having to input an additional command set. In other words, the computer device for generating images provided by the present application can reduce the user's operating threshold and/or operating difficulty, thereby enabling users who are not familiar with the command set to generate the image file desired by the user through the computer device.
請參考圖14,圖14是說明本申請之第五實施例之用於產生圖像的方法的流程圖,更具體地說,圖14所示的方法能夠產生風格轉換圖像,而所述方法包括步驟S1410、S1420、S1430和S1440。在一些實施例中,如圖14所示的方法可以使用Visual Studio Code作為開發環境,並且搭配以python為程式語言的pytorch的架構來實現所述方法。在一些實施例中,如圖14所示的方法可以使用torch、itertools和numpy等開源函式庫。Please refer to FIG. 14, which is a flow chart of a method for generating an image according to the fifth embodiment of the present application. More specifically, the method shown in FIG. 14 can generate a style conversion image, and the method includes steps S1410, S1420, S1430, and S1440. In some embodiments, the method shown in FIG. 14 can use Visual Studio Code as a development environment and use the framework of pytorch with python as the programming language to implement the method. In some embodiments, the method shown in FIG. 14 can use open source libraries such as torch, itertools, and numpy.
在步驟S1410中,接收輸入圖像(具體可以是至少一輸入圖像)。步驟S1410可以透過如圖13所示的處理模組220B的圖像接收單元221B而被執行。在一些實施例中,步驟S1410可以與圖3所示的步驟S310基本相同。In step S1410, an input image (specifically, at least one input image) is received. Step S1410 may be performed by the image receiving unit 221B of the processing module 220B as shown in FIG13. In some embodiments, step S1410 may be substantially the same as step S310 shown in FIG3.
在步驟S1420中,在接收從風格轉換按鍵(具體可以是至少一風格轉換按鍵)中的任何一個所發送的風格轉換請求之後,基於所述風格轉換請求來將所述輸入圖像輸入至風格轉換模型290。步驟S1420可以透過如圖13所示的處理模組220B的風格轉換請求接收單元223B和風格轉換模型290而被執行。在一些實施例中,步驟S1420可以接續在步驟S1410之後被執行。In step S1420, after receiving a style conversion request sent from any one of the style conversion buttons (specifically, at least one style conversion button), the input image is input to the style conversion model 290 based on the style conversion request. Step S1420 can be executed through the style conversion request receiving unit 223B and the style conversion model 290 of the processing module 220B as shown in Figure 13. In some embodiments, step S1420 can be executed after step S1410.
在一些實施例中,所述風格轉換按鍵可以是預先被建立的,供使用者可以透過所述風格轉換按鍵中的任何一個來對所述輸入圖像進行風格轉換操作。在一個具體的範例中,所述風格轉換按鍵可以包括第一風格轉換按鍵(標籤內容顯示為水墨畫風格)、第二風格轉換按鍵(標籤內容顯示為抽象派風格)和第三風格轉換按鍵(標籤內容顯示為素描畫風格),當使用者點選第三風格轉換按鍵時,第三風格轉換按鍵可以發出風格轉換請求,使得風格轉換請求接收單元223B在接收到由第三風格轉換按鍵所發送的風格轉換請求之後,處理模組220B可以將輸入圖像輸入至風格轉換模型290。在一些實施例中,風格轉換模型290可以是例如AdaAttN(Adaptive Attention Normalization),但不限於此。In some embodiments, the style conversion buttons may be pre-established, so that the user can perform style conversion operation on the input image through any one of the style conversion buttons. In a specific example, the style conversion buttons may include a first style conversion button (the label content is displayed as ink painting style), a second style conversion button (the label content is displayed as abstract style) and a third style conversion button (the label content is displayed as sketch style). When the user clicks the third style conversion button, the third style conversion button may issue a style conversion request, so that after the style conversion request receiving unit 223B receives the style conversion request sent by the third style conversion button, the processing module 220B may input the input image into the style conversion model 290. In some embodiments, the style transfer model 290 may be, for example, AdaAttN (Adaptive Attention Normalization), but is not limited thereto.
在步驟S1430中,透過風格轉換模型290來產生風格轉換圖像。步驟S1430可以透過如圖13所示的處理模組220B的風格轉換模型290而被執行。在一些實施例中,步驟S1430可以接續在步驟S1420之後被執行。風格轉換模型290可以基於從學習圖像所學習到的特徵將輸入圖像轉換成具有所述特徵的風格轉換圖像,藉以產生風格轉換圖像。In step S1430, a style-converted image is generated through the style-converted model 290. Step S1430 may be performed through the style-converted model 290 of the processing module 220B as shown in FIG. 13. In some embodiments, step S1430 may be performed after step S1420. The style-converted model 290 may convert the input image into a style-converted image having the features learned from the learning image based on the features learned from the learning image, thereby generating the style-converted image.
在步驟S1440中,透過風格轉換模型290來輸出風格轉換圖像。步驟S1440可以透過如圖13所示的處理模組220B的風格轉換模型290而被執行。在一些實施例中,步驟S1440可以接續在步驟S1430之後被執行。也就是說,透過執行步驟S1440,可以供處理模組220B進一步針對由風格轉換模型290所產生並輸出的風格轉換圖像進行後續的處理(例如,圖17所示的步驟S1710)。In step S1440, the style conversion image is outputted through the style conversion model 290. Step S1440 can be executed through the style conversion model 290 of the processing module 220B as shown in FIG13. In some embodiments, step S1440 can be executed after step S1430. That is, by executing step S1440, the processing module 220B can further perform subsequent processing (for example, step S1710 shown in FIG17) on the style conversion image generated and outputted by the style conversion model 290.
藉此,透過如圖14所示的方法不僅可以基於接收到的輸入圖像來產生風格轉換圖像,還可以供使用者在不需要額外自行輸入指令組的情況下,就能夠產生所述風格轉換圖像。也就是說,本申請提供之用於產生圖像的計算機裝置可以降低使用者的操作門檻及/或操作難度,進而使不熟悉指令組的使用者也能夠透過所述計算機裝置來產生使用者所期許的圖像檔。Thus, the method shown in FIG. 14 can not only generate a style conversion image based on the received input image, but also generate the style conversion image without the user having to input an additional command set. In other words, the computer device for generating images provided by the present application can reduce the user's operating threshold and/or operating difficulty, thereby enabling users who are not familiar with the command set to generate the image file desired by the user through the computer device.
請參考圖15,圖15的來源為:Liu, S., Lin, T., He, D., Li, F., Wang, M., Li, X., ... & Ding, E. (2021). Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6649-6658).,圖15是說明本申請之將至少一輸入圖像分別轉換成不同的風格轉換圖像的結果的示意圖。Please refer to Figure 15, the source of Figure 15 is: Liu, S., Lin, T., He, D., Li, F., Wang, M., Li, X., ... & Ding, E. (2021). Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6649-6658). Figure 15 is a schematic diagram illustrating the result of converting at least one input image into different style conversion images of the present application.
所述輸入圖像可以是指如圖15所示的輸入圖像1501X至輸入圖像1508X中的任何一個,但不限於此。所述學習圖像可以是指如圖15所示的學習圖像1501Y至學習圖像1506Y的任何一個,但不限於此。在一個具體的範例中,當所接收到的輸入圖像為輸入圖像1508X且使用者點選第四風格轉換按鍵(標籤內容顯示為素描畫風格)時,風格轉換模型290可以基於從學習圖像1506Y所學習到的特徵將輸入圖像1508X轉換成具有所述特徵的風格轉換圖像15086Z。The input image may refer to any one of the input images 1501X to 1508X shown in FIG. 15 , but is not limited thereto. The learning image may refer to any one of the learning images 1501Y to 1506Y shown in FIG. 15 , but is not limited thereto. In a specific example, when the received input image is the input image 1508X and the user clicks the fourth style conversion button (the label content is displayed as a sketch style), the style conversion model 290 may convert the input image 1508X into a style conversion image 15086Z having the features based on the features learned from the learning image 1506Y.
類似地,當所接收到的輸入圖像為其他的輸入圖像時,風格轉換模型290同樣可以根據使用者點選的風格轉換按鍵,使風格轉換模型290可以基於從與所述風格轉換按鍵相對應的學習圖像所學習到的特徵將所述輸入圖像轉換成具有所述特徵的風格轉換圖像。舉例來說,風格轉換模型290可以分別基於從學習圖像1501Y至學習圖像1505Y所學習到的特徵將輸入圖像1508X轉換成具有所述特徵的風格轉換圖像15081Z至風格轉換圖像15085Z。類似地,其他輸入圖像1501X至輸入圖像1507X同樣也能轉換成其對應的風格轉換圖像。Similarly, when the received input image is another input image, the style conversion model 290 can also convert the input image into a style conversion image having the features based on the features learned from the learning image corresponding to the style conversion button according to the style conversion button clicked by the user. For example, the style conversion model 290 can convert the input image 1508X into style conversion images 15081Z to style conversion images 15085Z having the features based on the features learned from the learning images 1501Y to 1505Y, respectively. Similarly, other input images 1501X to 1507X can also be converted into their corresponding style-converted images.
請參考圖16,圖16是說明本申請之另一個實施例之在使用者裝置上的顯示畫面1600的示意圖。使用者裝置上的顯示畫面1600基本可以包括第一顯示區塊1610和風格轉換按鍵1630(具體可以是至少一風格轉換按鍵1630)。應注意的是,圖16所示的各個元件的配置關係(例如,相對大小及/或相對位置)可以根據使用者的需求而進行調整,也就是說,圖16所示的配置關係僅是示意性說明的一個範例。Please refer to FIG. 16, which is a schematic diagram of a display screen 1600 on a user device for illustrating another embodiment of the present application. The display screen 1600 on the user device may basically include a first display area 1610 and a style conversion button 1630 (specifically, at least one style conversion button 1630). It should be noted that the configuration relationship (e.g., relative size and/or relative position) of each element shown in FIG. 16 can be adjusted according to the needs of the user, that is, the configuration relationship shown in FIG. 16 is only an example of schematic illustration.
風格轉換按鍵1630供使用者可以透過點選風格轉換按鍵1630中的任何一個來對輸入圖像進行風格轉換操作。在一些實施例中,風格轉換按鍵1630可以是預先被建立的。在另一些實施例中,風格轉換按鍵1630可以是根據與所述輸入圖像相對應的字元組的內容及使用者的歷史編修記錄中的至少一者而自動地被建立。在一些實施例中,風格轉換按鍵1630的數量可以是複數個,藉以提供使用者可以在較佳的使用者體驗下來點選適合的風格轉換按鍵1630。在一些實施例中,可以透過編輯與風格轉換按鍵1630中的每一個相對應的標籤內容,藉以提供使用者可以更直觀地根據所述標籤內容來知悉各個風格轉換按鍵1630對所述輸入圖像進行的風格轉換操作後所產生的預期結果。The style conversion buttons 1630 allow the user to perform style conversion operations on the input image by clicking any one of the style conversion buttons 1630. In some embodiments, the style conversion buttons 1630 may be pre-established. In other embodiments, the style conversion buttons 1630 may be automatically established based on at least one of the content of the character group corresponding to the input image and the user's historical editing record. In some embodiments, the number of style conversion buttons 1630 may be plural, so as to provide the user with a better user experience by clicking a suitable style conversion button 1630. In some embodiments, by editing the label content corresponding to each style conversion button 1630, the user can more intuitively know the expected results of the style conversion operation performed by each style conversion button 1630 on the input image based on the label content.
在一些實施例中,第一顯示區塊1610可以被配置成顯示風格轉換圖像。更具體地說,使用者在點選風格轉換按鍵1630中的任何一個之後,使用者可以透過第一顯示區塊1610來知悉所產生的風格轉換圖像。藉此,透過第一顯示區塊1610的顯示內容,使用者可以評估當下的風格轉換圖像是否符合使用者的預期,並決定是否需要透過風格轉換按鍵1630來重新產生新的風格轉換圖像。In some embodiments, the first display block 1610 may be configured to display a style conversion image. More specifically, after the user clicks any one of the style conversion buttons 1630, the user may be aware of the generated style conversion image through the first display block 1610. Thus, through the display content of the first display block 1610, the user may evaluate whether the current style conversion image meets the user's expectations, and decide whether it is necessary to regenerate a new style conversion image through the style conversion button 1630.
請參考圖17,圖17是說明本申請之第六實施例之用於產生圖像的方法的流程圖,更具體地說,圖17所示的方法能夠產生風格轉換圖像,而所述方法包括步驟S1410、S1420、S1430、S1440和S1710。步驟S1410、S1420、S1430和S1440與圖14所示的步驟基本相同,也就是說,圖17所示的方法可以包括與圖14基本相同的步驟S1410、S1420、S1430和S1440,並進一步包括步驟S1710。Please refer to FIG. 17, which is a flow chart of a method for generating an image according to the sixth embodiment of the present application. More specifically, the method shown in FIG. 17 can generate a style conversion image, and the method includes steps S1410, S1420, S1430, S1440, and S1710. Steps S1410, S1420, S1430, and S1440 are substantially the same as those shown in FIG. 14, that is, the method shown in FIG. 17 may include steps S1410, S1420, S1430, and S1440 substantially the same as those in FIG. 14, and further includes step S1710.
在步驟S1710中,將風格轉換圖像傳輸至圖像輸出裝置,並透過圖像輸出裝置來將風格轉換圖像進行實體輸出。在一些實施例中,步驟S1710可以接續在步驟S1440之後被執行。在一些實施例中,步驟S1710可以類似於圖12所示的步驟S1210,而步驟S1710與圖12所示的步驟S1210之間的差異之處在於,透過執行步驟S1710是將所產生的風格轉換圖像進行輸出。In step S1710, the style conversion image is transmitted to the image output device, and the style conversion image is physically outputted by the image output device. In some embodiments, step S1710 may be executed after step S1440. In some embodiments, step S1710 may be similar to step S1210 shown in FIG. 12 , and the difference between step S1710 and step S1210 shown in FIG. 12 is that the generated style conversion image is outputted by executing step S1710.
藉此,透過如圖17所示的步驟S1710可以進一步將由處理模組220B所產生的風格轉換圖像輸出並印製在實體的物件上,使得所述風格轉換圖像可以更廣泛地被應用在諸如成衣印刷、汽車烤漆印刷、手機殼印刷或貼紙等的各種實體印製需求的領域。Thus, through step S1710 as shown in FIG. 17 , the style conversion image generated by the processing module 220B can be further output and printed on a physical object, so that the style conversion image can be more widely used in various physical printing demand fields such as garment printing, automobile paint printing, mobile phone case printing or sticker printing.
在一些實施例中,如上所述的用於產生圖像的方法的各個步驟可以被儲存在電腦可讀取記錄媒體中,所述電腦可讀取記錄媒體可以是例如硬碟、光碟、磁碟、隨身碟或可由網路存取的資料庫,但不限於此。所述電腦可讀取記錄媒體透過計算機裝置載入內存的程式碼並執行所述程式碼後,能夠實現如上所述的用於產生圖像的方法中的任何一種方法。In some embodiments, each step of the method for generating an image as described above may be stored in a computer-readable recording medium, which may be, for example, a hard disk, an optical disk, a magnetic disk, a flash drive, or a database accessible via a network, but is not limited thereto. The computer-readable recording medium can implement any of the methods for generating an image as described above after the computer device loads the program code into the memory and executes the program code.
在一些實施例中,用於產生圖像的電腦程式產品可以包括如上所述的用於產生圖像的方法的各個步驟,使得計算機裝置在載入所述電腦程式產品並執行所述電腦程式產品後,能夠實現如上所述的用於產生圖像的方法中的任何一種方法。In some embodiments, a computer program product for generating an image may include the various steps of the method for generating an image as described above, so that after loading and executing the computer program product, a computer device can implement any of the methods for generating an image as described above.
本申請已經透過上述的實施例和所附之圖式作進一步的說明,但本申請所屬技術領域中具有通常知識者仍可以在不違背本申請之申請專利範圍中所提出的範圍與精神下做出許多的修改與變化。因此,本申請的保護範圍仍應以申請專利範圍所界定者為準,不應被說明書所揭示的內容而限制。This application has been further explained through the above-mentioned embodiments and the attached drawings, but those with ordinary knowledge in the technical field to which this application belongs can still make many modifications and changes without violating the scope and spirit proposed in the scope of the patent application of this application. Therefore, the scope of protection of this application should still be based on the scope of the patent application and should not be limited by the content disclosed in the specification.
110:使用者裝置 120:伺服器 200:計算機裝置 210:圖像接收模組 220、220A、220B:處理模組 221A、221B:圖像接收單元 222A:字串產生單元 223A:修圖請求接收單元 223B:風格轉換請求接收單元 224A:字串編修單元 224B:風格轉換單元 225A:圖像產生單元 226A、226B:圖像輸出單元 230:儲存模組 240:圖像輸出模組 250:圖像資料庫 260:編修資料庫 270:影像描述模型 280:圖像產生模型 290:風格轉換模型 310:圖像輸出裝置 320:伺服器 510:輸入圖像 520:關鍵詞字元組 530:編修字元組 540:人工智慧圖像 600:顯示畫面 610:第一顯示區塊 620:第二顯示區塊 630:修圖按鍵 830:編修字元組 840:人工智慧圖像 930:編修字元組 940:人工智慧圖像 1501X~1508X:輸入圖像 1501Y~1506Y:學習圖像 15081Z~15086Z:風格轉換圖像 1600:顯示畫面 1610:第一顯示區塊 1630:風格轉換按鍵 S310、S320、S330、S340:步驟 S410A、S420A、S430A :步驟 S410B、S420B、S430B :步驟 S710:步驟 S1010、S1020 :步驟 S1110、S1120、S1130:步驟 S1140、S1150:步驟 S1210:步驟 S1410、S1420、S1430、S1440:步驟 S1710:步驟 110: User device 120: Server 200: Computer device 210: Image receiving module 220, 220A, 220B: Processing module 221A, 221B: Image receiving unit 222A: String generating unit 223A: Image editing request receiving unit 223B: Style conversion request receiving unit 224A: String editing unit 224B: Style conversion unit 225A: Image generating unit 226A, 226B: Image output unit 230: Storage module 240: Image output module 250: Image database 260: Editing database 270: Image description model 280: Image generating model 290: Style conversion model 310: Image output device 320: Server 510: Input image 520: Keyword character group 530: Edit character group 540: Artificial intelligence image 600: Display screen 610: First display block 620: Second display block 630: Image editing button 830: Edit character group 840: Artificial intelligence image 930: Edit character group 940: Artificial intelligence image 1501X~1508X: Input image 1501Y~1506Y: Learning image 15081Z~15086Z: Style conversion image 1600: Display screen 1610: First display area 1630: Style conversion button S310, S320, S330, S340: Step S410A, S420A, S430A : Step S410B, S420B, S430B : Step S710: Step S1010, S1020 : Step S1110, S1120, S1130: Step S1140, S1150: Step S1210: Step S1410, S1420, S1430, S1440: Step S1710: Step
圖1是說明本申請之一個實施例之計算機裝置與使用者裝置及伺服器中的至少一者訊號連接並且與圖像輸出裝置及伺服器中的至少一者訊號連接的連接關係示意圖。 圖2是說明本申請之一個實施例之處理模組的方塊示意圖。 圖3是說明本申請之第一實施例之用於產生圖像的方法的流程圖。 圖4A是說明本申請之一個範例之基於至少一輸入圖像來產生關鍵詞字元組的詳細流程圖。 圖4B是說明本申請之一個範例之基於編修字元組來產生人工智慧圖像的詳細流程圖。 圖5是說明本申請之一個範例之在執行各個步驟之後所產生的結果的示意圖。 圖6是說明本申請之一個實施例之在使用者裝置上的顯示畫面的示意圖。 圖7是說明本申請之第二實施例之用於產生圖像的方法的流程圖。 圖8是說明本申請之另一個範例之在執行各個步驟之後所產生的結果的示意圖。 圖9是說明本申請之又一個範例之在執行各個步驟之後所產生的結果的示意圖。 圖10是說明本申請之第三實施例之用於產生圖像的方法的流程圖。 圖11是說明本申請之另一個範例之基於至少一輸入圖像來產生關鍵詞字元組的詳細流程圖。 圖12是說明本申請之第四實施例之用於產生圖像的方法的流程圖。 圖13是說明本申請之另一個實施例之處理模組的方塊示意圖。 圖14是說明本申請之第五實施例之用於產生圖像的方法的流程圖。 圖15是說明本申請之將至少一輸入圖像分別轉換成不同的風格轉換圖像的結果的示意圖。 圖16是說明本申請之另一個實施例之在使用者裝置上的顯示畫面的示意圖。 圖17是說明本申請之第六實施例之用於產生圖像的方法的流程圖。 FIG. 1 is a schematic diagram illustrating a connection relationship in which a computer device of an embodiment of the present application is signal-connected to at least one of a user device and a server and is signal-connected to at least one of an image output device and a server. FIG. 2 is a block diagram illustrating a processing module of an embodiment of the present application. FIG. 3 is a flow chart illustrating a method for generating an image of a first embodiment of the present application. FIG. 4A is a detailed flow chart illustrating a method of generating a keyword character group based on at least one input image of an example of the present application. FIG. 4B is a detailed flow chart illustrating a method of generating an artificial intelligence image based on an edited character group of an example of the present application. FIG. 5 is a schematic diagram illustrating a result generated after executing each step of an example of the present application. FIG6 is a schematic diagram of a display screen on a user device illustrating an embodiment of the present application. FIG7 is a flow chart illustrating a method for generating an image in a second embodiment of the present application. FIG8 is a schematic diagram illustrating the results generated after executing each step of another example of the present application. FIG9 is a schematic diagram illustrating the results generated after executing each step of another example of the present application. FIG10 is a flow chart illustrating a method for generating an image in a third embodiment of the present application. FIG11 is a detailed flow chart illustrating another example of the present application for generating a keyword character set based on at least one input image. FIG12 is a flow chart illustrating a method for generating an image in a fourth embodiment of the present application. FIG. 13 is a block diagram of a processing module of another embodiment of the present application. FIG. 14 is a flow chart of a method for generating an image of the fifth embodiment of the present application. FIG. 15 is a diagram of the result of converting at least one input image into different style conversion images of the present application. FIG. 16 is a diagram of a display screen on a user device of another embodiment of the present application. FIG. 17 is a flow chart of a method for generating an image of the sixth embodiment of the present application.
S310:步驟 S310: Step
S320:步驟 S320: Step
S330:步驟 S330: Step
S340:步驟 S340: Steps
Claims (22)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/605,872 US20240371049A1 (en) | 2023-05-04 | 2024-03-15 | Method, computer device, and non-transitory computer-readable recording medium for generating image |
| EP24167179.1A EP4492329A1 (en) | 2023-05-04 | 2024-03-28 | Method, computer device, non-transitory computer-readable recording medium, and computer program product for generating image |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363500267P | 2023-05-04 | 2023-05-04 | |
| US63/500,267 | 2023-05-04 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202445391A TW202445391A (en) | 2024-11-16 |
| TWI870063B true TWI870063B (en) | 2025-01-11 |
Family
ID=94377592
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW112139431A TWI870063B (en) | 2023-05-04 | 2023-10-16 | The method, the computing device, the computer-readable storage medium, and the computer program product for generating an image |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI870063B (en) |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7028253B1 (en) * | 2000-10-10 | 2006-04-11 | Eastman Kodak Company | Agent for integrated annotation and retrieval of images |
| TW202016691A (en) * | 2018-10-29 | 2020-05-01 | 聯發科技股份有限公司 | Mobile device and video editing method thereof |
| CN112106042A (en) * | 2018-05-29 | 2020-12-18 | 三星电子株式会社 | Electronic device and control method thereof |
| TWI735112B (en) * | 2019-03-18 | 2021-08-01 | 大陸商北京市商湯科技開發有限公司 | Method, apparatus and electronic device for image generating and storage medium thereof |
| CN113672086A (en) * | 2021-08-05 | 2021-11-19 | 腾讯科技(深圳)有限公司 | A page processing method, apparatus, device and medium |
| CN114792388A (en) * | 2021-01-25 | 2022-07-26 | 北京三星通信技术研究有限公司 | Image description character generation method and device and computer readable storage medium |
| CN116051388A (en) * | 2021-10-27 | 2023-05-02 | 奥多比公司 | Automatic photo editing via language request |
-
2023
- 2023-10-16 TW TW112139431A patent/TWI870063B/en active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7028253B1 (en) * | 2000-10-10 | 2006-04-11 | Eastman Kodak Company | Agent for integrated annotation and retrieval of images |
| CN112106042A (en) * | 2018-05-29 | 2020-12-18 | 三星电子株式会社 | Electronic device and control method thereof |
| TW202016691A (en) * | 2018-10-29 | 2020-05-01 | 聯發科技股份有限公司 | Mobile device and video editing method thereof |
| TWI735112B (en) * | 2019-03-18 | 2021-08-01 | 大陸商北京市商湯科技開發有限公司 | Method, apparatus and electronic device for image generating and storage medium thereof |
| CN114792388A (en) * | 2021-01-25 | 2022-07-26 | 北京三星通信技术研究有限公司 | Image description character generation method and device and computer readable storage medium |
| CN113672086A (en) * | 2021-08-05 | 2021-11-19 | 腾讯科技(深圳)有限公司 | A page processing method, apparatus, device and medium |
| CN116051388A (en) * | 2021-10-27 | 2023-05-02 | 奥多比公司 | Automatic photo editing via language request |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202445391A (en) | 2024-11-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8532377B2 (en) | Image ranking based on abstract concepts | |
| CN112150347B (en) | Image modification patterns learned from a limited set of modified images | |
| EP4492329A1 (en) | Method, computer device, non-transitory computer-readable recording medium, and computer program product for generating image | |
| US9953425B2 (en) | Learning image categorization using related attributes | |
| US8917943B2 (en) | Determining image-based product from digital image collection | |
| US20120294514A1 (en) | Techniques to enable automated workflows for the creation of user-customized photobooks | |
| CN114913303B (en) | Virtual image generation method and related device, electronic device, and storage medium | |
| KR20230026344A (en) | Customization of text messages in editable videos in multimedia messaging applications | |
| US11615507B2 (en) | Automatic content-aware collage | |
| US20250095256A1 (en) | In-context image generation using style images | |
| CN120071055B (en) | Text-to-image generation model evaluation method and system based on multi-mode large model | |
| CN117009581B (en) | Video generation method and device | |
| KR20200064591A (en) | Webtoons color customizing programs and applications of deep learning | |
| KR20240111058A (en) | Device and Method for Generating Prompt in order for Image Generation | |
| CN117876557A (en) | Cascading domain bridging for image generation | |
| CN117876558A (en) | Cascaded domain bridging for image generation | |
| CN119444897A (en) | A method, device, equipment and medium for batch generation of picture materials | |
| KR102622382B1 (en) | Method for creating banner image automatically | |
| TWI870063B (en) | The method, the computing device, the computer-readable storage medium, and the computer program product for generating an image | |
| CN117036552A (en) | An animation sequence frame generation method and system based on diffusion model | |
| CN120145222A (en) | Artificial intelligence annotation and training integrated system and method | |
| CN118762096A (en) | Method, device, electronic device and medium for generating image based on target object | |
| CN117173284A (en) | Image generation method, device, equipment and storage medium | |
| CN117726718A (en) | E-commerce product poster generation method based on artificial intelligence image generation tool | |
| KR20220138112A (en) | Cartoon background automatic generation method and apparatus |