JP2000301782A

JP2000301782A - Method for computer mounting for executing approximation of gray scale gradation using image forming apparatus in more limited range and printer using the same

Info

Publication number: JP2000301782A
Application number: JP11223325A
Authority: JP
Inventors: K Ganapashii Purabiin; ケイ．ガナパシイプラビーン; Rabii S; エス．ラビイ; Kumaru Sakuru Bibeku; クマルサクルビベク; Surinibasan Aaru; アール．スリニバサン
Original assignee: Texas Instruments Inc
Current assignee: Texas Instruments Inc
Priority date: 1998-09-16
Filing date: 1999-08-06
Publication date: 2000-10-31

Abstract

PROBLEM TO BE SOLVED: To reduce an operation time by alleviating a condition necessary for a memory in terms of a method and a printer wherein a gray scale gradation is approximated using an image forming apparatus in a more limited range. SOLUTION: This printer executes rendering of an object represented by a page description language in scanning of an image forming apparatus (401) and determines based on the rendering an image region having the object subjected to the rendering. At that time, bounding boxes 403, 405 that surround all of the objects subjected to the rendering are determined, then the image region is determined. Input pixels in the image region are subjected to screening (407), but ones outside thereof are not. Thus, the necessary or unnecessary of the screening is discriminated so that a time period required for the screening can be reduced and each row of preference matrix is divided into segments so that use of a memory is improved.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明の属する技術分野はプ
リンタであり、より具体的にはページ記述ファイル形式
の入力データをプリントエンジン用の制御信号に変換す
るプリンタの電子技術に関する。The technical field to which the invention pertains relates to printers, and more specifically to printer electronics for converting input data in page description file format into control signals for a print engine.

【０００２】[0002]

【従来の技術】スクリーニングは、単にデジタル画素を
描画できるにすぎないディスプレー上において連続階調
画像という錯覚を与える処理である。画像印刷処理で
は、原画の完全な複製を再現するため、入力画像の多数
のグレー水準（レベル）が印刷装置によってシミュレー
トされなければならない。しかしながら、印刷画像で
は、画素解像度が目で認知可能なものに限定され得る。
それゆえ、隣接画素をグループ化することによって、画
像において連続階調をシミュレートすることを可能にす
る。2. Description of the Related Art Screening is a process of giving the illusion of a continuous tone image on a display on which only digital pixels can be drawn. In the image printing process, a large number of gray levels of the input image must be simulated by the printing device to reproduce a perfect reproduction of the original. However, in a printed image, the pixel resolution may be limited to what is visually perceptible.
Therefore, by grouping adjacent pixels, it is possible to simulate continuous tone in an image.

【０００３】スクリーニングは、２つの範疇、すなわ
ち、２水準閾値スクリーニングと多水準閾値スクリーニ
ングのうちの１つにおける閾値方法によって行うことが
できる。２水準閾値スクリーニングでは、入力画素の
（ｘ、ｙ）座標が２次元ｍｘｎ行列内へのインデクシン
グに使用される。行列における個々のエントリが、入力
画素グレー水準に対して比較されるグレー水準閾値とな
る。比較結果に基づいて２進値（０または１）が出力さ
れる。多水準閾値スクリーニングでは、３次元ルックア
ップテーブル内にインデクシングする。この３次元ルッ
クアップテーブルは、ＭｘＮサイズの２次元選好行列と
して構成される。この選好行列は、像空間における繰り
返し可能な空間タイルである。選好行列の各エントリ
は、（ｘ、ｙ）の位置に対して使用されなければならな
い階調曲線の１つの数を持つ。階調曲線は、入力画素グ
レー値レンジを印刷処理のレンジ内に変換する補償伝達
関数である。階調曲線変換関数は、一組の閾値に基づい
て量子化され、ルックアップテーブルの形で記憶され
る。ルックアップテーブルはそれぞれｂビットサイズの
非スクリーン入力画素に対する２^b個のエントリを含
む。２^b個のエントリは全てｃビットサイズの対応スク
リーン出力画素を含む。この処理は、プリンタのダイナ
ミックレンジ内で色を混合することによって入力画像の
大きなダイナミックレンジをより小さなプリンタダイナ
ミックレンジに翻訳する方法を提供する。[0003] Screening can be performed by a threshold method in one of two categories: two-level threshold screening and multi-level threshold screening. In two-level threshold screening, the (x, y) coordinates of the input pixels are used for indexing into a two-dimensional mxn matrix. Each entry in the matrix is a gray level threshold that is compared against the input pixel gray level. A binary value (0 or 1) is output based on the comparison result. Multi-level threshold screening indexes into a three-dimensional look-up table. This three-dimensional lookup table is configured as a two-dimensional preference matrix of M × N size. This preference matrix is a repeatable spatial tile in image space. Each entry in the preference matrix has one number of tone curves that must be used for the (x, y) position. The tone curve is a compensation transfer function that converts the input pixel gray value range into the range of the printing process. The tone curve conversion function is quantized based on a set of thresholds and stored in a look-up table. Each look-up table comprises 2 ^b entries for non-screen input pixel b-bit size. 2 ^b-number of entries includes a corresponding screen output pixels of all c-bit size. This process provides a way to translate the large dynamic range of the input image into a smaller printer dynamic range by mixing colors within the printer dynamic range.

【０００４】[0004]

【発明が解決しようとする課題】本発明は、グレースケ
ール階調をより制限されたレンジの画像作成装置、すな
わち、スクリーニングとして公知の処理を用いて近似す
ることを含む。本発明は、いつスクリーニングが必要と
されないかを識別する。レンダリング処理で、レンダリ
ング対象を有する画像領域を決定する。スクリーニング
はそれらの領域で行われ、他の領域では行われない。ス
クリーニングは、選好行列の各行をセグメントに分割す
る。これらのセグメントに関連づけられたルックアップ
テーブルがメモリキャッシュ内に順次ロードされる。ロ
ードされたセグメントルックアップテーブルに位置する
入力画素がスクリーニングされる。その後選好行列の次
のセグメントに関連づけられたルックアップテーブルが
メモリキャッシュにロードされ、当該セグメント内に位
置する入力画素をスクリーニングするために使用され
る。本方法は、選好行列が奇数行長を持つときでも、Ｍ
を行長として、Ｍ−１個の入力画素とＭ＋１個の入力画
素を交互に考察することによって、多水準スクリーニン
グを行いながら２つの出力画素を１つのデータワードに
パックする。Ｍ−１個の入力画素あるいはＭ＋１個の入
力画素は各組とも偶数となるので、出力データワードに
パックするために偶数個の画素を考察することが可能と
なる。SUMMARY OF THE INVENTION The present invention involves approximating grayscale tones using a more limited range of image producing devices, ie, a process known as screening. The present invention identifies when screening is not required. In the rendering processing, an image area having a rendering target is determined. Screening is performed in those areas and not in other areas. Screening divides each row of the preference matrix into segments. The look-up tables associated with these segments are sequentially loaded into the memory cache. Input pixels located in the loaded segment lookup table are screened. The look-up table associated with the next segment of the preference matrix is then loaded into the memory cache and used to screen for input pixels located within that segment. The method can be used to compute M even when the preference matrix has odd row lengths.
, And alternately consider M-1 input pixels and M + 1 input pixels to pack the two output pixels into one data word while performing multi-level screening. Since each set of M-1 or M + 1 input pixels is even, it is possible to consider an even number of pixels for packing into an output data word.

【０００５】[0005]

【発明の実施の形態】図１は、本発明によってイメージ
・グラフィック処理用に構成したマルチプロセッサ集積
回路１００を含むネットワークプリンタシステム１のブ
ロック図である。マルチプロセッサ集積回路１００は、
図１のネットワークプリンタシステムの画像演算のため
のデータ操作およびデータ計算を含むデータ処理を提供
する。FIG. 1 is a block diagram of a network printer system 1 including a multiprocessor integrated circuit 100 configured for image and graphics processing according to the present invention. The multiprocessor integrated circuit 100
1 provides data processing including data manipulation and data calculation for image operations of the network printer system of FIG. 1.

【０００６】図１はトランシーバ３を示している。トラ
ンシーバ３はネットワークプリンタバスと通信チャネル
の間の翻訳と双方向通信を提供する。トランシーバ３を
使用するシステムの一例はローカルエリアネットワーク
である。図１に示すネットワークプリンタシステムは、
ローカルエリアネットワークの通信チャネル経由で受信
される印刷要求に応答する。マルチプロセッサ集積回路
１００は、ポストスクリプト（PostScript）のようなペ
ージ記述言語で明示されるプリントジョブを翻訳する。FIG. 1 shows a transceiver 3. Transceiver 3 provides translation and two-way communication between the network printer bus and the communication channel. One example of a system that uses the transceiver 3 is a local area network. The network printer system shown in FIG.
Respond to print requests received via the local area network communication channel. The multiprocessor integrated circuit 100 translates print jobs specified in a page description language such as PostScript.

【０００７】図１は、ネットワークプリンタシステムバ
スに接続したシステムメモリ４を示している。このメモ
リは、ビデオランダムアクセスメモリ、ダイナミックラ
ンダムアクセスメモリ、スタティックランダムアクセス
メモリ、ＥＰＲＯＭやＦＬＡＳＨやリードオンリメモリ
のような不揮発性メモリ、あるいは、これらのメモリ型
式の組み合わせを含んでも良い。マルチプロセッサ集積
回路１００は、メモリ４に記憶されたプログラムによっ
て全面的に制御されても部分的に制御されても良い。こ
のメモリ４は種々のタイプのグラフィックイメージデー
タも記憶してもよい。FIG. 1 shows a system memory 4 connected to a network printer system bus. This memory may include video random access memory, dynamic random access memory, static random access memory, non-volatile memory such as EPROM, FLASH or read only memory, or a combination of these memory types. The multiprocessor integrated circuit 100 may be controlled entirely or partially by a program stored in the memory 4. This memory 4 may also store various types of graphic image data.

【０００８】図１のネットワークプリンタシステムで
は、マルチプロセッサ集積回路１００は、画素マップを
通して印刷可能画像の詳細用のプリントバッファメモリ
５と連絡する。マルチプロセッサ集積回路１００は、ネ
ットワークプリンタシステムバス２経由でプリントバッ
ファメモリ５に格納された画像データを制御する。この
画像に対応するデータは、プリントバッファメモリ５か
ら呼び戻されてプリントエンジン６に供給される。プリ
ントエンジン６は、印刷ページ上にカラードットを配置
する機構を提供する。プリントエンジン６は更に用紙及
びプリントヘッドの制御のためにマルチプロセッサ集積
回路１００から供給される制御信号にも応答する。マル
チプロセッサ集積回路１００はプリントバッファメモリ
５内のどこに印刷情報が格納されているかを決定し制御
する。その後、プリントバッファメモリ５から読み出す
間に、マルチプロセッサ集積回路１００はプリントバッ
ファメモリ５からの読み出しシーケンス、アクセスすべ
きアドレス、および、プリントエンジン６によって所望
の印刷画像を作成するのに必要な制御情報を決定する。In the network printer system of FIG. 1, the multiprocessor integrated circuit 100 communicates with a print buffer memory 5 for printable image details through a pixel map. The multiprocessor integrated circuit 100 controls the image data stored in the print buffer memory 5 via the network printer system bus 2. Data corresponding to this image is recalled from the print buffer memory 5 and supplied to the print engine 6. The print engine 6 provides a mechanism for arranging color dots on a print page. The print engine 6 is further responsive to control signals provided by the multiprocessor integrated circuit 100 for paper and printhead control. The multiprocessor integrated circuit 100 determines and controls where in the print buffer memory 5 the print information is stored. Thereafter, while reading from the print buffer memory 5, the multiprocessor integrated circuit 100 reads the sequence from the print buffer memory 5, the address to be accessed, and the control information necessary for creating a desired print image by the print engine 6. To determine.

【０００９】好適な実施例によると、本発明はマルチプ
ロセッサ集積回路１００を用いる。この好適な実施例
は、本発明を実現する複数の同一プロセッサを含む。こ
れらのプロセッサはそれぞれデジタルイメージ・グラフ
ィックプロセッサと称することとする。この説明は便宜
上にすぎない。本発明を実現するプロセッサは、単一の
集積回路あるいは複数の集積回路上に別個に組み立てら
れたプロセッサであってもよい。単一の集積回路で実施
する場合、この単一集積回路は、任意に、デジタルイメ
ージ・グラフィックプロセッサによって使用されるリー
ドオンリーメモリとランダムアクセスメモリを含んでも
良い。According to a preferred embodiment, the present invention uses a multiprocessor integrated circuit 100. This preferred embodiment includes multiple identical processors that implement the invention. Each of these processors will be referred to as a digital image and graphics processor. This description is for convenience only. A processor implementing the present invention may be a single integrated circuit or a processor separately assembled on a plurality of integrated circuits. When implemented on a single integrated circuit, the single integrated circuit may optionally include read-only memory and random access memory used by a digital image and graphics processor.

【００１０】図２は本発明の好適な実施例のマルチプロ
セッサ集積回路１００の構成を示している。マルチプロ
セッサ集積回路１００は、それぞれ複数の区間に分割さ
れている２つのランダムアクセスメモリ１０及び２０
と、クロスバー５０と、マスタープロセッサ６０と、デ
ジタルイメージ・グラフィックプロセッサ７１、７２、
７３及び７４と、システムメモリへのアクセスを仲介す
る転送コントローラ８０と、そして、独立した第１及び
第２イメージメモリへのアクセスを制御できるフレーム
コントローラ９０とを含む。マルチプロセッサ集積回路
１００は、マルチメディア計算におけるような画像処理
とグラフィック演算において有用な、高度な演算並列性
を提供する。FIG. 2 shows the configuration of a multiprocessor integrated circuit 100 according to a preferred embodiment of the present invention. The multiprocessor integrated circuit 100 includes two random access memories 10 and 20 each divided into a plurality of sections.
, A crossbar 50, a master processor 60, digital image and graphics processors 71, 72,
73 and 74, a transfer controller 80 that mediates access to the system memory, and a frame controller 90 that can control access to independent first and second image memories. Multiprocessor integrated circuit 100 provides a high degree of computational parallelism useful in image processing and graphics operations, such as in multimedia computing.

【００１１】マルチプロセッサ集積回路１００は２つの
ランダムアクセスメモリを有する。ランダムアクセスメ
モリ１０は主としてマスタープロセッサ６０に対して割
り当てられる。ランダムアクセスメモリ１０は２つの命
令キャッシュメモリ１１及び１２と、２つのデータキャ
ッシュメモリ１３及び１４と、パラメータメモリ１５を
有する。これらのメモリ区間は物理的に同一のものであ
り得るが、別様に接続使用される。ランダムアクセスメ
モリ２０は、マスタープロセッサ６０とデジタルイメー
ジ・グラフィックプロセッサ７１、７２、７３及び７４
のそれぞれによってアクセス可能である。デジタルイメ
ージ・グラフィックプロセッサ７１、７２、７３及び７
４は、それぞれ５つの対応メモリ区間を有する。これら
には、１つの命令キャッシュメモリと、３つのデータメ
モリと、１つのパラメータメモリが含まれる。かくして
デジタルイメージ・グラフィックプロセッサ７１は対応
命令キャッシュメモリ２１と、データメモリ２２、２
３、２４と、パラメータメモリ２５を有し、デジタルイ
メージ・グラフィックプロセッサ７２は対応命令キャッ
シュメモリ２６と、データメモリ２７、２８、２９と、
パラメータメモリ３０を有し、デジタルイメージ・グラ
フィックプロセッサ７３は対応命令キャッシュメモリ３
１と、データメモリ３２、３３、３４と、パラメータメ
モリ３５を有し、デジタルイメージ・グラフィックプロ
セッサ７４は対応命令キャッシュメモリ３６と、データ
メモリ３７、３８、３９と、パラメータメモリ４０を有
する。ランダムアクセスメモリ１０の区間と同様、これ
らのメモリ区間は物理的に同一であり得るが、別様に接
続使用される。メモリ１０及び２０のこれらのメモリ区
間はそれぞれ好適には２Ｋバイトを有し、マルチプロセ
ッサ集積回路１００内の合計メモリは５０Ｋバイトにな
る。The multiprocessor integrated circuit 100 has two random access memories. The random access memory 10 is mainly assigned to the master processor 60. The random access memory 10 has two instruction cache memories 11 and 12, two data cache memories 13 and 14, and a parameter memory 15. These memory sections can be physically the same, but are used differently. The random access memory 20 includes a master processor 60 and digital image / graphics processors 71, 72, 73 and 74.
Is accessible by each of Digital image and graphics processors 71, 72, 73 and 7
4 each have five corresponding memory sections. These include one instruction cache memory, three data memories, and one parameter memory. Thus, the digital image / graphics processor 71 includes the corresponding instruction cache memory 21 and the data memories 22,
3, 24, and a parameter memory 25. The digital image / graphics processor 72 includes a corresponding instruction cache memory 26, data memories 27, 28, 29,
The digital image / graphics processor 73 has a parameter memory 30 and a corresponding instruction cache memory 3.
1, data memories 32, 33 and 34, and a parameter memory 35, and the digital image / graphics processor 74 has a corresponding instruction cache memory 36, data memories 37, 38 and 39, and a parameter memory 40. Like the sections of the random access memory 10, these memory sections can be physically the same, but are used differently. Each of these memory sections of memories 10 and 20 preferably has 2 Kbytes, for a total memory in multiprocessor integrated circuit 100 of 50 Kbytes.

【００１２】マルチプロセッサ集積回路１００は、複数
の独立した並列データ転送を用いてプロセッサとメモリ
の間に高速のデータ転送速度を提供する。各デジタルイ
メージ・グラフィックプロセッサ７１、７２、７３及び
７４はそれぞれ３つのメモリポートを有し、それらは各
サイクル同時に動作しても良い。命令ポート（Ｉ）は対
応命令キャッシュからの６４ビットデータワードを取り
込むことができる。ローカルデータポート（Ｌ）は、当
該デジタルイメージ・グラフィックプロセッサに対応す
るデータメモリやパラメータメモリから１つの３２ビッ
トデータワードを読み出したり、それらに１つの３２ビ
ットデータワードを書き込んだりすることができる。グ
ローバルデータポート（Ｇ）は、データメモリやパラメ
ータメモリやランダムアクセスメモリ２０のどれからも
１つの３２ビットデータワードを読み出したり、それら
に１つの３２ビットデータワードを書き込んだりするこ
とができる。マスタープロセッサ６０は２つのメモリポ
ートを有する。命令ポート（Ｉ）は、命令キャッシュ１
１及び１２のどれからも３２ビット命令ワードを取り込
むことができる。データポート（Ｃ）は、ランダムアク
セスメモリ１０のデータキャッシュ１３或いは１４、パ
ラメータメモリ１５や、データメモリやパラメータメモ
リやランダムアクセスメモリ２０のどれからも１つの３
２ビットデータワードを読み出したり、それらに１つの
３２ビットデータワードを書き込んだりすることができ
る。転送コントローラ８０は、データポート（Ｃ）を介
してランダムアクセスメモリ１０或いは２０のどの区間
にもアクセス可能である。かくして１５個の並列メモリ
アクセスがどの単一メモリサイクルにおいて要求されて
も良い。ランダムアクセスメモリ１０及び２０は、この
ように多くの並列アクセスを支持するために２５個のメ
モリに分割されている。The multiprocessor integrated circuit 100 provides a high data transfer rate between the processor and the memory using a plurality of independent parallel data transfers. Each digital image and graphics processor 71, 72, 73 and 74 has three memory ports each, which may operate simultaneously in each cycle. The instruction port (I) can take in a 64-bit data word from the corresponding instruction cache. The local data port (L) can read one 32-bit data word from a data memory or a parameter memory corresponding to the digital image / graphics processor, or write one 32-bit data word to them. The global data port (G) can read one 32-bit data word from any of the data memory, the parameter memory, and the random access memory 20 and write one 32-bit data word to them. Master processor 60 has two memory ports. The instruction port (I) has an instruction cache 1
A 32-bit instruction word can be taken from any of 1 and 12. The data port (C) is one of the data cache 13 or 14 of the random access memory 10, the parameter memory 15, and one of the data memory, the parameter memory, and the random access memory 20.
Two-bit data words can be read and one 32-bit data word can be written to them. The transfer controller 80 can access any section of the random access memory 10 or 20 via the data port (C). Thus, fifteen parallel memory accesses may be required in any single memory cycle. The random access memories 10 and 20 are divided into 25 memories to support such a large number of parallel accesses.

【００１３】クロスバー５０は、マスタープロセッサ６
０、各デジタルイメージ・グラフィックプロセッサ７
１、７２、７３及び７４、および、転送コントローラ８
０とメモリ１０及び２０との接続を制御する。クロスバ
ー５０は行と列に配された複数の交叉点５１を有する。
交叉点５１の各列は単一メモリ区間と対応アドレス範囲
に相当する。プロセッサは、そのプロセッサの出力した
アドレスの最上位ビットを通してメモり区間のうちの１
つへのアクセスを要求する。このプロセッサの出力した
アドレスは、行に沿って伝わる。そのアドレスを有する
メモリ区間に対応する交叉点５１は、そのメモリ区間へ
のアクセスを容認するか拒否するか、どちらかによって
応答する。現在のメモリサイクル中に他のプロセッサが
そのメモリ区間へのアクセスを要求していないなら、交
叉点５１は行と列を繋ぐことによってアクセスを容認す
る。これによってメモリ区間にアドレスが供給される。
メモリ区間はそのアドレスにおけるデータアクセスを許
可することによって応答する。このデータアクセスは、
データ読み出し動作かデータ書き込み動作かのどちらか
であってよい。The crossbar 50 is connected to the master processor 6.
0, each digital image graphic processor 7
1, 72, 73 and 74 and transfer controller 8
0 and the connection between the memories 10 and 20 is controlled. The crossbar 50 has a plurality of intersection points 51 arranged in rows and columns.
Each column of the intersection points 51 corresponds to a single memory section and a corresponding address range. The processor transmits one of the memory intervals through the most significant bit of the address output from the processor.
Request access to one. The address output by this processor propagates along the rows. The crossing point 51 corresponding to the memory section having that address responds by either accepting or denying access to that memory section. If no other processor has requested access to the memory section during the current memory cycle, cross point 51 grants access by connecting the rows and columns. Thus, an address is supplied to the memory section.
The memory section responds by allowing data access at that address. This data access
Either a data read operation or a data write operation may be performed.

【００１４】２つ以上のプロセッサが同時に同一メモリ
区間へのアクセスを要求するなら、クロスバー５０は要
求するプロセッサのうちの１つにだけアクセスを容認す
る。クロスバー５０の各列にある交叉点５１は優先順位
に基づいて通信しアクセスを容認する。同じ順位を持つ
２つのアクセス要求が同時に発生するなら、クロスバー
５０はラウンドロビンに基づいてアクセスを容認するの
で、最近にアクセスを容認されたプロセッサが最低優先
順位を持つことになる。容認されたアクセスはそれぞれ
要求に応える必要があるとされる限り存続する。プロセ
ッサはメモリサイクル毎にそれぞれのアドレスを変更し
ても良いので、クロスバー５０はサイクル毎にプロセッ
サとメモリ区間の相互接続を変更することができる。If two or more processors request access to the same memory section at the same time, crossbar 50 allows access to only one of the requesting processors. The intersections 51 in each row of the crossbar 50 communicate based on priority and allow access. If two access requests with the same rank occur at the same time, the crossbar 50 grants access on a round robin basis, so that the processor recently granted access has the lowest priority. Each granted access lasts as long as it is necessary to fulfill the request. Since the processor may change each address in each memory cycle, the crossbar 50 can change the interconnection between the processor and the memory section in each cycle.

【００１５】マスタープロセッサ６０は、好適には、マ
ルチプロセッサ集積回路１００の主要な制御機能を果た
す。マスタープロセッサ６０は、好適には、ハードウェ
ア浮動小数点計算ユニットを含む３２ビット縮約命令セ
ットコンピュータ（ＲＩＳＣ）プロセッサである。ＲＩ
ＳＣアーキテクチャによると、メモリへのアクセスは全
てロード及びストア命令を用いて行われ、殆どの整数及
び論理演算は単一サイクルにおいてレジスタ上で行われ
る。しかしながら、浮動小数点計算ユニットは、整数及
び論理ユニットによって使用されるものと同一のレジス
タファイルを用いる場合、一般的には、演算の実行に数
サイクルかかる。レジスタスコアボードは、正しいレジ
スタアクセスシーケンスが維持されることを保証する。
ＲＩＳＣアーキテクチャは、画像処理における制御機能
に適当である。浮動小数点計算ユニットは、画像処理に
重要であるかもしれない、画像回転機能の迅速な計算を
可能にする。Master processor 60 preferably performs the primary control functions of multiprocessor integrated circuit 100. Master processor 60 is preferably a 32-bit reduced instruction set computer (RISC) processor that includes a hardware floating point calculation unit. RI
According to the SC architecture, all accesses to memory are performed using load and store instructions, and most integer and logical operations are performed on registers in a single cycle. However, if the floating point arithmetic unit uses the same register file as used by the integer and logic units, it generally takes several cycles to perform the operation. The register scoreboard ensures that the correct register access sequence is maintained.
The RISC architecture is suitable for control functions in image processing. The floating point calculation unit allows for rapid calculation of the image rotation function, which may be important for image processing.

【００１６】マスタープロセッサ６０は、命令キャッシ
ュメモリ１１あるいは命令キャッシュメモリ１２から命
令ワードを取り込む。同様に、マスタープロセッサ６０
は、データキャッシュ１３あるいはデータキャッシュ１
４のどちらか一方からデータを取り込む。各メモリ区間
は２Ｋバイトのメモリを含むので、４Ｋバイトの命令キ
ャッシュと４Ｋバイトのデータキャッシュが存在する。
キャッシュ制御はマスタープロセッサ６０の不可欠な機
能である。前述のように、マスタープロセッサ６０は、
クロスバー５０経由で他のメモリ区間にアクセスするこ
ともできる。The master processor 60 fetches an instruction word from the instruction cache memory 11 or the instruction cache memory 12. Similarly, the master processor 60
Is the data cache 13 or data cache 1
4. Data is fetched from one of the four. Since each memory section includes 2 Kbytes of memory, there is a 4 Kbyte instruction cache and a 4 Kbyte data cache.
Cache control is an essential function of the master processor 60. As described above, the master processor 60
Other memory sections can be accessed via the crossbar 50.

【００１７】４つのデジタルイメージ・グラフィックプ
ロセッサ７１、７２、７３及び７４は、それぞれ、高度
に並列なデジタル信号処理装置（ＤＳＰ）アーキテクチ
ャを有する。図３は、模範的なデジタルイメージ・グラ
フィックプロセッサ７１の概要を示しているが、これは
デジタルイメージ・グラフィックプロセッサ７２、７３
及び７４と同一のものである。デジタルイメージ・グラ
フィックプロセッサ７１は、３つの分離ユニット、すな
わち、データユニット１１０と、アドレスユニット１２
０と、プログラムフロー制御ユニット１３０を使用し
て、演算の高度な並列処理を達成する。これらの３つの
ユニットは、命令パイプラインにおいて異なる命令によ
って同時に動作する。加えて、これらのユニットはそれ
ぞれ内部並列処理も有する。Each of the four digital image and graphics processors 71, 72, 73 and 74 has a highly parallel digital signal processor (DSP) architecture. FIG. 3 shows an overview of an exemplary digital image graphics processor 71, which includes digital image graphics processors 72, 73.
And 74. The digital image / graphics processor 71 has three separate units: a data unit 110 and an address unit 12.
0 and the program flow control unit 130 are used to achieve a high degree of parallel processing of operations. These three units operate simultaneously with different instructions in the instruction pipeline. In addition, each of these units also has internal parallel processing.

【００１８】デジタルイメージ・グラフィックプロセッ
サ７１、７２、７３及び７４は、複数命令複数データ
（ＭＩＭＤ）モードにおいて独立した命令ストリームを
実行できる。ＭＩＭＤモードでは、各デジタルイメージ
・グラフィックプロセッサはそれぞれに対応した命令キ
ャッシュからの個別プログラムを実行する。その命令キ
ャッシュは独立的であっても協調的であっても良い。後
者の場合、クロスバー５０は、共用メモリと結合してプ
ロセッサー間連絡を可能にする。デジタルイメージ・グ
ラフィックプロセッサ７１、７２、７３及び７４は、ま
た、同期ＭＩＭＤモードでも動作できる。同期ＭＩＭＤ
モードでは、各デジタルイメージ・グラフィックプロセ
ッサのプログラムフロー制御ユニット１３０が、すべて
の同期プロセッサが開始可能状態になるまで、次の命令
を取り込むことを禁止する。この同期ＭＩＭＤモードに
よって、デジタルイメージ・グラフィックプロセッサの
別々のプログラムは、密接に結合した１つのオペレーシ
ョンとしてロックステップで実行できるようになる。Digital image and graphics processors 71, 72, 73 and 74 can execute independent instruction streams in a multiple instruction multiple data (MIMD) mode. In the MIMD mode, each digital image / graphics processor executes a separate program from its corresponding instruction cache. The instruction cache may be independent or cooperative. In the latter case, crossbar 50 is coupled to shared memory to allow interprocessor communication. Digital image and graphics processors 71, 72, 73 and 74 can also operate in a synchronous MIMD mode. Synchronous MIMD
In the mode, the program flow control unit 130 of each digital image and graphics processor is inhibited from fetching the next instruction until all synchronous processors are ready to start. This synchronous MIMD mode allows the separate programs of the digital image and graphics processor to execute in lockstep as one tightly coupled operation.

【００１９】デジタルイメージ・グラフィックプロセッ
サ７１、７２、７３及び７４は、単一命令複数データ
（ＳＩＭＤ）モードにおいて異なるデータに対して同一
の命令を実行できる。このモードでは、４つのデジタル
イメージ・グラフィックプロセッサに対する単一の命令
ストリームが命令キャッシュメモリ２１から生じる。デ
ジタルイメージ・グラフィックプロセッサ７１は取り込
み及び分岐動作を制御して、クロスバー５０が他のデジ
タルイメージ・グラフィックプロセッサ７２、７３及び
７４に同じ命令を供給する。デジタルイメージ・グラフ
ィックプロセッサ７１が全てのデジタルイメージ・グラ
フィックプロセッサ７１、７２、７３及び７４に対する
命令取り込みを制御するので、ＳＩＭＤモードでは、デ
ジタルイメージ・グラフィックプロセッサは本質的に同
期する。Digital image and graphics processors 71, 72, 73 and 74 can execute the same instruction on different data in a single instruction multiple data (SIMD) mode. In this mode, a single instruction stream for the four digital image graphics processors comes from the instruction cache memory 21. Digital image graphics processor 71 controls the capture and branch operations so that crossbar 50 provides the same instructions to other digital image graphics processors 72, 73 and 74. In SIMD mode, the digital image graphics processors are essentially synchronized because the digital image graphics processor 71 controls instruction capture for all digital image graphics processors 71, 72, 73 and 74.

【００２０】転送コントローラ８０は、マルチプロセッ
サ集積回路１００用のダイレクトメモリアクセス（ＤＭ
Ａ）機械及びメモリインターフェイスの結合したもので
ある。転送コントローラ８０は、インテリジェントに待
機し、優先順位を設定し、５つのプログラマブルプロセ
ッサのデータ要求とキャッシュミスを管理する。マスタ
ープロセッサ６０とデジタルイメージ・グラフィックプ
ロセッサ７１、７２、７３及び７４は全て転送コントロ
ーラ８０経由でマルチプロセッサ集積回路１００の外部
にあるメモリとシステムにアクセスする。データキャッ
シュ或いは命令キャッシュのキャッシュミスは自動的に
転送コントローラ８０によって処理される。キャッシュ
サービスポート（Ｓ）がこうしたキャッシュミスを転送
コントローラ８０に送信する。キャッシュサービスポー
ト（Ｓ）はメモリからではなくプロセッサから情報を読
み出す。マスタープロセッサ６０とデジタルイメージ・
グラフィックプロセッサ７１、７２、７３及び７４は連
係リストパケット要求として転送コントローラ８０から
のデータ転送を要求する。これらの連係リストパケット
要求によって、多次元ブロックの情報がソースメモリア
ドレスと宛先メモリアドレスの間で転送されることも可
能になる。これらのメモリアドレスは、マルチプロセッ
サ集積回路１００内部でもマルチプロセッサ集積回路１
００外部でも良い。転送コントローラ８０は、好適に
は、ダイナミックランダムアクセスメモリ（ＤＲＡＭ）
のためのリフレッシュコントローラも含む。ＤＲＡＭは
そのデータを維持するのに周期的なリフレッシュを必要
とする。The transfer controller 80 has a direct memory access (DM) for the multiprocessor integrated circuit 100.
A) Combined machine and memory interface. Transfer controller 80 intelligently waits, sets priorities, and manages data requests and cache misses for the five programmable processors. Master processor 60 and digital image and graphics processors 71, 72, 73 and 74 all access memory and systems external to multiprocessor integrated circuit 100 via transfer controller 80. Cache misses in the data cache or instruction cache are automatically handled by the transfer controller 80. The cache service port (S) transmits such a cache miss to the transfer controller 80. The cache service port (S) reads information from the processor, not from memory. Master processor 60 and digital image
The graphic processors 71, 72, 73 and 74 request data transfer from the transfer controller 80 as linked list packet requests. These linked list packet requests also allow multi-dimensional block information to be transferred between source and destination memory addresses. These memory addresses are stored inside the multiprocessor integrated circuit 100 and in the multiprocessor integrated circuit 1.
00 outside. Transfer controller 80 is preferably a dynamic random access memory (DRAM)
Also includes a refresh controller. DRAMs require periodic refreshes to maintain their data.

【００２１】フレームコントローラ９０はマルチプロセ
ッサ集積回路１００と外部画像キャプチャ及び表示シス
テムの間のインターフェイスである。フレームコントロ
ーラ９０はキャプチャ及び表示装置に関する制御を提供
し、これらの装置とメモリの間のデータ移動を自動的に
管理する。このために、フレームコントローラ９０は２
つの独立した画像システムに同時制御を提供する。これ
らの画像システムには、典型的には、画像キャプチャの
ための第１画像システムと、画像表示のための第２画像
システムが含まれる。ただし、フレームコントローラ９
０の使用は利用者によって制御される。これらの画像シ
ステムは、通常、フレームグラバーあるいはフレームバ
ッファ記憶装置に使用される独立したフレームメモリを
含む。フレームコントローラ９０は、好適には、リフレ
ッシュ及びシフトレジスタ制御を通してビデオダイナミ
ックランダムアクセスメモリ（ＶＲＡＭ）を制御するよ
う動作する。Frame controller 90 is the interface between multiprocessor integrated circuit 100 and an external image capture and display system. Frame controller 90 provides control over the capture and display devices and automatically manages the movement of data between these devices and memory. To this end, the frame controller 90
Provides simultaneous control for two independent imaging systems. These imaging systems typically include a first imaging system for image capture and a second imaging system for image display. However, the frame controller 9
The use of 0 is controlled by the user. These imaging systems typically include a separate frame memory used for frame grabber or frame buffer storage. Frame controller 90 preferably operates to control a video dynamic random access memory (VRAM) through refresh and shift register control.

【００２２】マルチプロセッサ集積回路１００は大規模
画像処理用に設計される。マスタープロセッサ６０は、
組込型制御、すなわち、デジタルイメージ・グラフィッ
クプロセッサ７１、７２、７３及び７４の活動を組織化
することと、それらによって生じる結果を解釈すること
とを行う。デジタルイメージ・グラフィックプロセッサ
７１、７２、７３及び７４は画素解析および操作に適当
である。画素がデータとしては高水準だが情報としては
低水準であると考えられる場合、典型的な使用では、デ
ジタルイメージ・グラフィックプロセッサ７１、７２、
７３及び７４はそれらの画素を試験して生データを情報
に変換しても良い。この情報は、その後、デジタルイメ
ージ・グラフィックプロセッサ７１、７２、７３及び７
４あるいはマスタープロセッサ６０によって解析可能で
ある。クロスバー５０は、プロセッサー間連絡の仲介を
する。クロスバー５０によって、マルチプロセッサ集積
回路１００は共用メモリとして実装可能となる。本アー
キテクチャではメッセージ交換が通信の主要型式である
必要はない。しかし、メッセージは共用メモリ経由で交
換できる。各デジタルイメージ・グラフィックプロセッ
サ、クロスバー５０の対応区間、および、メモリ２０の
対応区間は、同じ幅を有する。このことから、各デジタ
ルイメージ・グラフィックプロセッサと対応メモリの追
加や取り外しをモジュール方式で適応させて、同じピン
アウトを維持すれば、アーキテクチャに融通性を持たせ
ることができる。The multiprocessor integrated circuit 100 is designed for large-scale image processing. The master processor 60
It performs embedded control, i.e., organizing the activities of the digital image and graphics processors 71, 72, 73 and 74 and interpreting the results produced thereby. Digital image and graphics processors 71, 72, 73 and 74 are suitable for pixel analysis and manipulation. If a pixel is considered high-level for data but low-level for information, typical use would be to use digital image and graphics processors 71, 72,
73 and 74 may test their pixels to convert the raw data into information. This information is then provided to digital image and graphics processors 71, 72, 73 and 7
4 or by the master processor 60. The crossbar 50 mediates communication between processors. The crossbar 50 allows the multiprocessor integrated circuit 100 to be implemented as a shared memory. In this architecture, message exchange need not be the primary type of communication. However, messages can be exchanged via shared memory. The corresponding section of each digital image / graphics processor, crossbar 50, and the corresponding section of memory 20 have the same width. This allows the architecture to be flexible if the addition and removal of each digital image / graphics processor and corresponding memory is modularly adapted to maintain the same pinout.

【００２３】好適な実施例では、マルチプロセッサ集積
回路１００の全ての部分は単一の集積回路に配置され
る。好適な実施例では、マルチプロセッサ集積回路１０
０は、０．６ｍの特徴サイズを用いる相補形金属酸化膜
半導体（ＣＭＯＳ）に形成される。マルチプロセッサ集
積回路１００は、好適には、２５６ピンを持つピングリ
ッドアレイパッケージに構成される。入出力は、好適に
は、トランジスタトランジスタ論理回路（ＴＴＬ）論理
電圧に適合したものである。マルチプロセッサ集積回路
１００は、好適には、３百万トランジスタを含み、５０
ＭＨＺのクロックレートを用いる。In the preferred embodiment, all portions of multiprocessor integrated circuit 100 are located on a single integrated circuit. In the preferred embodiment, the multiprocessor integrated circuit 10
0 is formed in a complementary metal oxide semiconductor (CMOS) using a feature size of 0.6 m. The multiprocessor integrated circuit 100 is preferably configured in a pin grid array package having 256 pins. The inputs and outputs are preferably compatible with transistor-transistor logic (TTL) logic voltages. The multiprocessor integrated circuit 100 preferably includes 3 million transistors, 50
The clock rate of MHZ is used.

【００２４】図３は模範的な各デジタルイメージ・グラ
フィックプロセッサ７１の概要を示しているが、デジタ
ルイメージ・グラフィックプロセッサ７２、７３及び７
４と事実上同一のものである。デジタルイメージ・グラ
フィックプロセッサ７１は、データユニット１１０と、
アドレスユニット１２０と、プログラムフロー制御ユニ
ット１３０を有する。データユニット１１０は、論理あ
るいは算術データ演算を行う。データユニット１１０
は、８つのデータレジスタＤ７−Ｄ０、１つのステータ
スレジスタ２１０、および、１つの複数フラグレジスタ
２１１を有する。アドレスユニット１２０は、ローカル
データポートとグローバルデータポートに対するロード
・ストアアドレスの生成を制御する。以下さらに説明す
るように、アドレスユニット１２０は、２つの事実上同
一のアドレス指定ユニットを有する。１つはローカルア
ドレス指定用であり、もう１つはグローバルアドレス指
定用である。これらのアドレス指定ユニットは、それぞ
れ、相対アドレスモードにおいて絶対アドレス指定を可
能にする１つのＡＬＬ「０」読取専用レジスタと、１つ
のスタックポインタと、５つのアドレスレジスタと、３
つのインデックスレジスタとを有する。アドレス指定ユ
ニットは、両アドレス指定ユニットから１つの組合せア
ドレスを形成するときに用いるグローバルビット多重制
御レジスタを共有する。プログラムフロー制御ユニット
１３０は、命令ポートから命令を取り込むためのアドレ
ス生成を含む、デジタルイメージ・グラフィックプロセ
ッサ７１のためのプログラムフローを制御する。プログ
ラムフロー制御ユニット１３０は、１つのプログラムカ
ウンタＰＣ７０１と、アドレスパイプライン段において
現在の命令のアドレスを保持する１つのアドレス段命令
ポインタＩＲＡ７０２と、実行パイプライン段において
現在の命令のアドレスを保持する１つの実行段命令ポイ
ンタＩＲＥ７０３と、サブルーチンからの復帰用アドレ
スを保持する１つのサブルーチン復帰段命令ポインタＩ
ＲＲＳ７０４と、ゼロオーバーヘッドループを制御する
一組のレジスタと、対応命令キャッシュメモリにある４
ブロックの命令ワードの最上位ビットを保持する、集合
としては７０８と称せられる４つのタグレジスタＴＡＧ
３−ＴＡＧ０とを有する。FIG. 3 shows an overview of each of the exemplary digital image and graphics processors 71;
It is virtually identical to 4. The digital image / graphics processor 71 includes a data unit 110,
It has an address unit 120 and a program flow control unit 130. The data unit 110 performs a logical or arithmetic data operation. Data unit 110
Has eight data registers D7-D0, one status register 210, and one multiple flag register 211. Address unit 120 controls the generation of load / store addresses for local and global data ports. As described further below, address unit 120 has two virtually identical addressing units. One is for local addressing and the other is for global addressing. Each of these addressing units has one ALL "0" read-only register, one stack pointer, five address registers, and three address registers that enable absolute addressing in relative address mode.
And two index registers. The addressing units share a global bit multiplexing control register used to form one combined address from both addressing units. Program flow control unit 130 controls the program flow for digital image and graphics processor 71, including address generation for fetching instructions from the instruction port. The program flow control unit 130 has one program counter PC 701, one address stage instruction pointer IRA 702 that holds the address of the current instruction in the address pipeline stage, and one that holds the address of the current instruction in the execution pipeline stage. One execution stage instruction pointer IRE703 and one subroutine return stage instruction pointer I holding an address for returning from a subroutine.
RRS 704, a set of registers controlling the zero overhead loop, and 4
Four tag registers TAG, collectively called 708, holding the most significant bits of the instruction word of the block
3-TAG0.

【００２５】デジタルイメージ・グラフィックプロセッ
サ７１は、図４に示すような１つの３段パイプライン上
で動作する。データユニット１１０と、アドレスユニッ
ト１２０と、プログラムフロー制御ユニット１３０は、
１つの命令パイプラインにおいて異なる命令によって同
時に動作する。時間順に並べると３段とは、取り込み
段、アドレス指定段、および、実行段である。こうし
て、いかなるときにも、デジタルイメージ・グラフィッ
クプロセッサ７１は３つの命令の異なる機能に基づいて
動作していることになる。パイプライン段という語句
は、特定の事象が、ストール状態中ではなく、パイプラ
イン進行時に発生することを示すため、クロックサイク
ルと称する代わりに用いるものである。The digital image / graphics processor 71 operates on one three-stage pipeline as shown in FIG. The data unit 110, the address unit 120, and the program flow control unit 130
Operate simultaneously by different instructions in one instruction pipeline. Arranged in chronological order, the three stages are a capture stage, an address designation stage, and an execution stage. Thus, at any given time, the digital image and graphics processor 71 is operating based on the different functions of the three instructions. The term pipeline stage is used instead of a clock cycle to indicate that a particular event occurs during the pipeline progression, not during a stall condition.

【００２６】プログラムフロー制御ユニット１３０は、
取り込みパイプライン段中に生じる全てのオペレーショ
ンを実行する。プログラムフロー制御ユニット１３０
は、プログラムカウンタと、ループロジックと、割込ロ
ジックと、パイプライン制御ロジックとを有する。取り
込みパイプライン段中に、次の命令ワードがメモリから
取り込まれる。次の命令ワードを命令キャッシュメモリ
２１に格納するかどうかを決定するために、プログラム
カウンタに含まれるアドレスがキャッシュタグレジスタ
と比較される。プログラムフロー制御ユニット１３０
は、次の命令ワードが存在するなら、命令キャッシュメ
モリ２１からその命令ワードを取り込むために、命令ポ
ートアドレスにプログラムカウンタ内のアドレスを供給
する。クロスバー５０は、対応する命令キャッシュ、こ
こでは命令キャッシュメモリ２１にこのアドレスを送信
し、それによって命令ワードを命令バス１３２上に戻
す。あるいは、キャッシュミスが発生すると、転送コン
トローラ８０は、次の命令ワードを得るために、外部メ
モリにアクセスする。プログラムカウンタは更新され
る。後続の命令ワードが次の連続アドレスにあるなら、
プログラムフロー制御ユニット１３０は、プログラムカ
ウンタを後ろにインクリメントする。あるいは、プログ
ラムフロー制御ユニット１３０は、ループロジックやソ
フトウェア分岐に応じて次の命令ワードのアドレスをロ
ードする。同期ＭＩＭＤモードが活動状態の場合、指定
されたデジタルイメージ・グラフィックプロセッサが全
て同期して、それが通信レジスタにおけるシンクビット
によって示されるまで、命令取り込みは待機となる。The program flow control unit 130
Perform all operations that occur during the acquisition pipeline stage. Program flow control unit 130
Has a program counter, loop logic, interrupt logic, and pipeline control logic. During the fetch pipeline stage, the next instruction word is fetched from memory. The address contained in the program counter is compared with the cache tag register to determine whether to store the next instruction word in instruction cache memory 21. Program flow control unit 130
Supplies the address in the program counter to the instruction port address to fetch the next instruction word from the instruction cache memory 21 if it exists. Crossbar 50 sends this address to the corresponding instruction cache, here instruction cache memory 21, thereby returning the instruction word on instruction bus 132. Alternatively, when a cache miss occurs, the transfer controller 80 accesses the external memory to obtain the next instruction word. The program counter is updated. If the following instruction word is at the next consecutive address,
The program flow control unit 130 increments the program counter backward. Alternatively, the program flow control unit 130 loads the address of the next instruction word according to the loop logic or software branch. If the synchronous MIMD mode is active, instruction fetch will wait until all designated digital image graphics processors have synchronized and are indicated by the sync bit in the communication register.

【００２７】アドレスユニット１２０は、アドレスパイ
プライン段のアドレス計算をすべて行う。アドレスユニ
ット１２０は２つの独立したアドレスユニットを有す
る。１つはグローバルポート用であり、もう１つはロー
カルポート用である。命令が１つあるいは２つのメモリ
アクセスを呼び出す場合、アドレスユニット１２０はア
ドレスパイプライン段中にアドレスを発生する。アドレ
スは、競合検出・優先順位付けのために、グローバルポ
ートアドレスバス１２１とローカルポートアドレスバス
１２２のそれぞれを経由してクロスバー５０に供給され
る。競合が無いなら、アクセスされたメモリは要求され
たアクセスを許可する準備をするが、メモりアクセスは
後続の実行パイプライン段中に起こる。The address unit 120 performs all address calculations in the address pipeline stage. The address unit 120 has two independent address units. One is for the global port and the other is for the local port. When an instruction invokes one or two memory accesses, the address unit 120 generates an address during an address pipeline stage. The address is supplied to the crossbar 50 via the global port address bus 121 and the local port address bus 122 for conflict detection and prioritization. If there is no contention, the accessed memory prepares to grant the requested access, but the memory access occurs during a subsequent execution pipeline stage.

【００２８】データユニット１１０は、実行パイプライ
ン段中のすべての論理および算術演算を行う。すべての
論理および算術演算と、メモリへの或いはメモリからの
すべてのデータ移動は、実行パイプライン段中に起こ
る。グローバルデータポートとローカルデータポート
は、アドレスパイプライン段中に開始されたメモリアク
セスを実行パイプライン段に完了する。グローバルデー
タポートとローカルデータポートは、メモリストアに必
要なすべてのデータ整列と、メモリロードに必要なすべ
てのデータ抽出および符号拡張とを行う。実行パイプラ
イン段の演算中にプログラムカウンタがデータ宛先とし
て指定されると、分岐が実施される前に２命令遅延が行
われる。こうした分岐命令の後に続く次の２命令は既に
取り込まれているので、パイプライン演算にはこの遅延
が必要である。ＲＩＳＣプロセッサで実施する場合、他
の有用な命令を２遅延スロット位置に配置しても良い。Data unit 110 performs all logic and arithmetic operations in the execution pipeline stages. All logic and arithmetic operations and all data movement to and from memory occur during the execution pipeline stage. The global data port and the local data port complete the memory access started during the address pipeline stage to the execution pipeline stage. The global data port and the local data port perform all data alignment required for memory store, and all data extraction and sign extension required for memory loading. If the program counter is designated as the data destination during the operation of the execution pipeline stage, a two-instruction delay occurs before the branch is taken. Since the next two instructions following such a branch instruction have already been fetched, this delay is required for pipeline operations. When implemented on a RISC processor, other useful instructions may be located at two delay slot locations.

【００２９】デジタルイメージ・グラフィックプロセッ
サ７１は３つの内部３２ビットデータバスを有する。こ
れらは、ローカルポートデータバスＬｂｕｓ１０３とグ
ローバルポートソースデータバスＧｓｒｃ１０５とグロ
ーバルポート宛先データバスＧｄｓｔ１０７である。こ
れら３つのバスはデータユニット１１０とアドレスユニ
ット１２０とプログラムフロー制御ユニット１３０とを
相互連結する。これら３つのバスは、ローカルポート１
４１とグローバルポート１４５を有するデータポートユ
ニット１４０にも接続されている。データポートユニッ
ト１４０はメモリアクセスを提供するクロスバー５０に
結合されている。Digital image and graphics processor 71 has three internal 32-bit data buses. These are a local port data bus Lbus103, a global port source data bus Gsrc105, and a global port destination data bus Gdst107. These three buses interconnect the data unit 110, the address unit 120 and the program flow control unit 130. These three buses are local port 1
41 and a data port unit 140 having a global port 145. Data port unit 140 is coupled to crossbar 50 that provides memory access.

【００３０】ローカルデータポート１４１はメモリへの
データ格納用のバッファ１４２を有する。マルチプレク
サ・バッファ回路１４３は、ローカルポートデータバス
１４４から、あるいはメモリからクロスバー５０を経由
して、あるいはローカルポートアドレスバス１２２か
ら、あるいはグローバルポートデータバス１４８からＬ
ｂｕｓ１０３上にデータをロードする。ローカルポート
データバスＬｂｕｓ１０３は、その後、レジスタ起源
（ストア）あるいはメモリ起源（ロード）の３２ビット
データを搬送する。好都合なことに、データユニット１
１０の算術演算を補うためにアドレスユニット１２０に
おける算術結果をローカルポートアドレスバス１２２、
マルチプレクサバス１４３経由でローカルポートデータ
バスＬｂｕｓ１０３に供給できる。このことを以下さら
に説明する。バッファ１４２とマルチプレクサ・バッフ
ァ１４３はデータの整列と抽出を行う。ローカルポート
データバスＬｂｕｓ１０３はデータユニット１１０内の
データレジスタに接続する。ローカルバス一時保持レジ
スタＬＴＤ１０４もまたローカルポートデータバスＬｂ
ｕｓ１０３に接続されている。The local data port 141 has a buffer 142 for storing data in a memory. The multiplexer / buffer circuit 143 is connected to the local port data bus 144 or from the memory via the crossbar 50, or from the local port address bus 122 or from the global port data bus 148.
Load data on the bus 103. The local port data bus Lbus 103 then carries 32-bit data from a register (store) or memory (load). Advantageously, data unit 1
The arithmetic result in the address unit 120 is complemented by the local port address bus 122 to supplement the
The data can be supplied to the local port data bus Lbus103 via the multiplexer bus 143. This will be further described below. Buffer 142 and multiplexer buffer 143 perform data alignment and extraction. The local port data bus Lbus103 connects to a data register in the data unit 110. The local bus temporary holding register LTD104 is also connected to the local port data bus Lb.
It is connected to us103.

【００３１】グローバルポートソースデータバスＧｓｒ
ｃ１０５とグローバルポート宛先データバスＧｄｓｔ１
０７はグローバルデータ転送を仲介する。これらのグロ
ーバルデータ転送は、メモリアクセス、レジスタからレ
ジスタへの移動、プロセッサ間の命令ワード転送のどれ
であっても良い。グローバルポートソースデータバスＧ
ｓｒｃ１０５はグローバルポートデータ転送の３２ビッ
トソース情報を搬送する。データソースは、デジタルイ
メージ・グラフィックプロセッサ７１のレジスタのどれ
か、あるいは、デジタルイメージ・グラフィックプロセ
ッサ７１、７２、７３あるいは７４のどれかに対応する
パラメータメモリのどれかのデータであり得る。データ
はグローバルポート１４５経由でメモりに格納される。
マルチプレクサバッファ１４６はローカルポートデータ
バスＬｂｕｓ１０３あるいはグローバルポートソースデ
ータバスＧｓｒｃ１０５から線路を選択し、データ整列
を行う。マルチプレクサバッファ１４６はクロスバー５
０経由でメモリに印加するためにこのデータをグローバ
ルポートデータバス１４８上に書き込む。グローバルポ
ートソースデータバスＧｓｒｃ１０５もまたデータをデ
ータユニット１１０に供給して、グローバルポートソー
スデータバスＧｓｒｃ１０５のデータを算術論理ユニッ
トソースの１つとして使用できるようにする。この後者
の接続によって、デジタルイメージ・グラフィックプロ
セッサ７１のどのレジスタも算術論理ユニット演算のた
めのソースになることができるようにされる。Global port source data bus Gsr
c105 and global port destination data bus Gdst1
07 mediates global data transfer. These global data transfers may be memory accesses, register-to-register transfers, or instruction word transfers between processors. Global port source data bus G
The src 105 carries 32-bit source information for global port data transfer. The data source may be data in any of the registers of the digital image graphics processor 71 or in any of the parameter memories corresponding to any of the digital image graphics processors 71, 72, 73 or 74. The data is stored in the memory via the global port 145.
The multiplexer buffer 146 selects a line from the local port data bus Lbus103 or the global port source data bus Gsrc105, and performs data alignment. The multiplexer buffer 146 is connected to the crossbar 5
This data is written on the global port data bus 148 for application to the memory via 0. Global port source data bus Gsrc 105 also provides data to data unit 110 so that data on global port source data bus Gsrc 105 can be used as one of the arithmetic logic unit sources. This latter connection allows any register of the digital image graphics processor 71 to be a source for arithmetic logic unit operations.

【００３２】グローバルポート宛先データバスＧｄｓｔ
１０７はグローバルバスデータ転送の３２ビット宛先デ
ータを搬送する。宛先はデジタルイメージ・グラフィッ
クプロセッサ７１のレジスタのどれかである。グローバ
ルポート１４５のバッファ１４７はグローバルポート宛
先データバスＧｄｓｔ１０７のデータのソーシングを行
う。バッファ１４７は必要とされるデータ抽出および符
号拡張オペレーションを実行する。このバッファ１１５
は、データソースがメモリで、そのためロードが行われ
ている場合に動作する。算術論理ユニット結果がグロー
バルポート宛先データバスＧｄｓｔ１０７のための代替
データソースとしての役割を果たす。このことによっ
て、デジタルイメージ・グラフィックプロセッサ７１の
どのレジスタも算術論理ユニットオペレーションの宛先
となることができるようにされる。グローバルバス一時
保持レジスタＧＴＤ１０８もまたグローバルポート宛先
データバスＧｄｓｔ１０７に接続されている。Global port destination data bus Gdst
107 carries 32-bit destination data for global bus data transfer. The destination is one of the registers of the digital image / graphics processor 71. The buffer 147 of the global port 145 performs sourcing of data of the global port destination data bus Gdst107. Buffer 147 performs the required data extraction and sign extension operations. This buffer 115
Works when the data source is memory, and so loading is taking place. The arithmetic logic unit result serves as an alternative data source for the global port destination data bus Gdst 107. This allows any register of the digital image graphics processor 71 to be the destination for arithmetic logic unit operations. The global bus temporary holding register GTD108 is also connected to the global port destination data bus Gdst107.

【００３３】マルチプレクサバッファ１４３および１４
６を含むサーキットリーは、レジスタに対してレジスタ
移動を提供するために、グローバルポートソースデータ
バスＧｓｒｃ１０５とグローバルポート宛先データバス
Ｇｄｓｔ１０７の間に接続している。これによって、デ
ジタルイメージ・グラフィックプロセッサ７１のどのレ
ジスタからもグローバルポートソースデータバスＧｓｒ
ｃ１０５上へのリードをグローバルポート宛先データバ
スＧｄｓｔ１０７経由でデジタルイメージ・グラフィッ
クプロセッサ７１のどのレジスタへも書き込むことがで
きるようになる。Multiplexer buffers 143 and 14
Circuitry 6, including 6, is connected between global port source data bus Gsrc 105 and global port destination data bus Gdst 107 to provide register movement for registers. Thus, the global port source data bus Gsr can be transmitted from any register of the digital image / graphics processor 71.
The read on c105 can be written to any register of the digital image / graphics processor 71 via the global port destination data bus Gdst107.

【００３４】注目すべきことは、有利なことに、グロー
バルポート宛先データバスＧｄｓｔ１０７を経由してメ
モリからデジタルイメージ・グラフィックプロセッサ７
１のどのレジスタのロードをも実行しながら、同時にグ
ローバルポートソースデータバスＧｓｒｃ１０５を経由
してどのレジスタからもデータユニット１１０内の算術
論理ユニットをソーシングできるということである。同
様に、グローバルポートソースデータバスＧｓｒｃ１０
５を経由してメモリへデジタルイメージ・グラフィック
プロセッサ７１のどのレジスタ内のデータをもストアし
ながら、グローバルポート宛先データバスＧｄｓｔ１０
７を経由してデジタルイメージ・グラフィックプロセッ
サ７１のどのレジスタへも算術論理ユニット演算の結果
をセーブできるということである。これらのデータ転送
の有用性は、以下さらに詳述する。It should be noted that, advantageously, the digital image and graphics processor 7 from the memory via the global port destination data bus Gdst 107
1 means that the arithmetic logic unit in the data unit 110 can be sourced from any register via the global port source data bus Gsrc 105 while executing the loading of any register. Similarly, the global port source data bus Gsrc10
5, while storing data in any of the registers of the digital image / graphics processor 71 in the memory, the global port destination data bus Gdst10
7, the result of the arithmetic and logic unit operation can be saved to any register of the digital image / graphics processor 71. The usefulness of these data transfers will be described in more detail below.

【００３５】プログラムフロー制御ユニット１３０は命
令キャッシュメモリ２１から命令バス１３２を経由して
取り込まれた命令ワードを受信する。この取り込まれた
命令ワードは、有利なことに、アドレス段命令レジスタ
ＩＲＡ７５１および実行段命令レジスタＩＲＥ７５２と
称せられる２つの６４ビット命令レジスタに格納され
る。命令レジスタＩＲＡおよびＩＲＥは、どちらも、そ
れぞれの内容を復号し配送させる。デジタルイメージ・
グラフィックプロセッサ７１は、復号した、あるいは、
部分的に復号した命令内容をデータユニット１１０およ
びアドレスユニット１２０に搬送する命令コードバス１
３３を有する。後述のように、１命令ワードは３２ビッ
トあるいは１５ビットあるいは３ビットの即値フィール
ドを有し得る。プログラムフロー制御ユニット１３０は
こうした即値フィールドをグローバルポートソースデー
タバスＧｓｒｃ１０５へ、その宛先に供給するために発
送する。The program flow control unit 130 receives the fetched instruction word from the instruction cache memory 21 via the instruction bus 132. This captured instruction word is advantageously stored in two 64-bit instruction registers, referred to as the address stage instruction register IRA 751 and the execution stage instruction register IRE 752. Instruction registers IRA and IRE both decode and deliver their contents. Digital image
The graphic processor 71 decodes or
An instruction code bus 1 for transporting the partially decoded instruction contents to the data unit 110 and the address unit 120
33. As described below, one instruction word may have a 32-bit, 15-bit, or 3-bit immediate field. Program flow control unit 130 routes these immediate fields to global port source data bus Gsrc 105 for delivery to its destination.

【００３６】デジタルイメージ・グラフィックプロセッ
サ７１は３つのアドレスバス１２１、１２２および１３
１を有する。アドレスユニット１２０はグローバルポー
トアドレスバス１２１およびローカルポートアドレスバ
ス１２２上にアドレスを生成する。以下さらに詳述する
ように、アドレスユニット１２０は、グローバルポート
アドレスバス１２１およびローカルポートアドレスバス
１２２上にアドレスをそれぞれ提供する別個のグローバ
ルおよびローカルアドレスユニットを有する。注意すべ
きは、ローカルアドレスユニット６２０は当該デジタル
イメージ・グラフィックプロセッサに対応するデータメ
モリ以外のメモリにもアクセスして良いということであ
る。その場合、ローカルアドレスユニットのアクセスは
グローバルポートアドレスバス１２１経由で行われる。
プログラムフロー制御ユニット１３０は、プログラムカ
ウンタとキャッシュ制御ロジックからのアドレスビット
の組合せから命令ポートアドレスバス１３１上の命令ア
ドレスのソーシングを行う。これらのアドレスバス１２
１、１２２および１３１は、それぞれ、アドレス、バイ
トストローブ、および、読取り書込み情報を搬送する。The digital image / graphics processor 71 has three address buses 121, 122 and 13
One. The address unit 120 generates an address on the global port address bus 121 and the local port address bus 122. As will be described in further detail below, address unit 120 has separate global and local address units that provide addresses on global port address bus 121 and local port address bus 122, respectively. Note that the local address unit 620 may access memory other than the data memory corresponding to the digital image and graphics processor. In that case, the access of the local address unit is performed via the global port address bus 121.
The program flow control unit 130 performs sourcing of instruction addresses on the instruction port address bus 131 from a combination of address bits from the program counter and the cache control logic. These address buses 12
1, 122 and 131 carry address, byte strobe, and read / write information, respectively.

【００３７】図５はマスタープロセッサ６０の単純化し
た図を示している。マスタープロセッサ６０の主なブロ
ックは、浮動小数点ユニット（ＦＰＵ）２０１と、レジ
スタファイル（ＲＦ）２０２と、浮動小数点演算とメモ
リロードの結果がデータキャッシュと浮動小数点ユニッ
ト２０１の間でソースおよび属性としてレジスタファイ
ル２０２へのそれらの共用書込みポートへのアクセスの
ために使用される前にそれらの結果を確実に利用可能な
ものとするレジスタスコアボード（ＳＢ）２０３と、ク
ロスバーを経由するチップ上メモリへのインターフェイ
スおよび転送プロセッサ８０を経由する外部メモリへの
インターフェイスをも統御するデータキャッシュコント
ローラ２０４と、シフト命令を実行するバレルシフタ
（ＢＳ）２０５と、ゼロ比較ロジック２０６と、最左・
最右検出ロジック（ＬＭＯ／ＲＭＯ）２０７と、加算、
減算および論理演算に使用され、しかも相対分岐中には
分岐対象アドレスを計算するために使用される整数算術
論理ユニット（ＡＬＵ）２０８と、マスタープロセッサ
割込み信号を受信する割込み保留レジスタ（ＩＮＴＰＥ
Ｎ）２０９と、割込みを選択的に可能あるいは不可能に
する割込み認可レジスタ（ＩＥ）２１０と、取り込まれ
るべき命令のアドレスを保持するプログラムカウンタレ
ジスタ（ＰＣ）２１１と、次の命令に向けてプログラム
カウンタ２１１をインクリメントし、その増分値を「リ
ターン」あるいは「リンク」アドレスとしてレジスタフ
ァイルに発送することもできるプログラムカウンタイン
クリメンタ（ＩＮＣ）２１２と、命令を復号し、制御信
号を動作ユニットに供給する命令復号ロジック（ＤＥＣ
ＯＤＥ）２１３と、実行中の命令のアドレスを保持する
命令レジスタ（ＩＲ）２１４と、いかなる命令即値デー
タも格納する即値レジスタ（ＩＭＭ）２１５と、キャッ
シュフィルのために転送プロセッサ８０へのインターフ
ェイスに実行すべき命令を提供する命令キャッシュコン
トローラ（ＩＣＡＣＨＥ）２１６とである。FIG. 5 shows a simplified diagram of the master processor 60. The main blocks of the master processor 60 are a floating point unit (FPU) 201, a register file (RF) 202, and the results of floating point operations and memory loads are registered between the data cache and the floating point unit 201 as sources and attributes. To a register scoreboard (SB) 203 that ensures that their results are available before they are used for accessing their shared write port to file 202, and to on-chip memory via a crossbar A data cache controller 204 that also controls the interface to the external memory via the transfer processor 80, a barrel shifter (BS) 205 for executing a shift instruction, a zero comparison logic 206,
Rightmost detection logic (LMO / RMO) 207, addition,
An integer arithmetic and logic unit (ALU) 208 used for subtraction and logic operations and for calculating the branch target address during a relative branch, and an interrupt pending register (INTPE) for receiving a master processor interrupt signal.
N) 209, an interrupt permission register (IE) 210 for selectively enabling or disabling an interrupt, a program counter register (PC) 211 for holding an address of an instruction to be fetched, and a program for the next instruction. A program counter incrementer (INC) 212 that can increment a counter 211 and send the increment value to a register file as a "return" or "link" address, decode the instruction, and provide control signals to the operating unit. Instruction decoding logic (DEC
ODE) 213, an instruction register (IR) 214 that holds the address of the instruction being executed, an immediate register (IMM) 215 that stores any instruction immediate data, and an interface to the transfer processor 80 for cache filling. An instruction cache controller (ICACHE) 216 that provides instructions to be performed.

【００３８】図６はマスタープロセッサ６０で使用され
る基本パイプラインを示している。マスタープロセッサ
６０は取り込み、実行およびメモリ段を含む３段パイプ
ラインを有する。図６はどのようにして３つの命令がパ
イプラインを通過するかを示している。パイプラインの
取り込み段中にプログラムカウンタ２１０が命令キャッ
シュをアドレス指定して３２ビット命令を読み取る。実
行段中に命令が復号され、ソースオペランドがレジスタ
ファイルから読み取られ、演算が実行され、結果がレジ
スタファイルに書き戻される。メモリ段は演算のロード
とストアのためにだけにある。実行段中に計算されたア
ドレスがデータキャッシュのアドレス指定に使用され、
データが読み書きされる。命令キャッシュでミスが発生
すると、取り込み及び実行段は要求が満たされるまで停
止される。データキャッシュでミスが発生すると、別の
メモリオペレーションを開始する必要が生じるまで、メ
モリパイプラインが停止するが、取り込み及び実行段は
流れを継続する。FIG. 6 shows the basic pipeline used in the master processor 60. Master processor 60 has a three-stage pipeline that includes the fetch, execute, and memory stages. FIG. 6 shows how three instructions pass through the pipeline. During the fetch stage of the pipeline, the program counter 210 addresses the instruction cache and reads a 32-bit instruction. During the execution stage, instructions are decoded, source operands are read from a register file, operations are performed, and results are written back to the register file. The memory stage is only for loading and storing operations. The address calculated during the execution stage is used to address the data cache,
Data is read and written. When a miss occurs in the instruction cache, the fetch and execute stage is stopped until the request is satisfied. When a miss occurs in the data cache, the memory pipeline stops until the need to start another memory operation, but the capture and execution stages continue to flow.

【００３９】図７は浮動小数点ユニット２０１のための
基本パイプラインを示している。取り込み段は前述の整
数演算の取り込み段と同様である。浮動小数点命令のア
ンパック段中に、ソースオペランド、命令コード、精度
および宛先アドレスを含む、浮動小数点演算を開始する
のに必要なすべてのデータが到着する。２つのソースオ
ペランドはレジスタファイルから読み取られる。オペラ
ンドはその後符号、べき指数、仮数フィールドにアンパ
ックされ、特殊ケースの検出が行われる。入力例外はこ
のサイクルで検出される。しかも、入力例外は浮動小数
点ユニット２０１を通過させられ、単一精度出力例外と
して同一サイクルで合図される。シグナリング・ノット
・ア・ナンバー（Not-A-Number）、クワイエット・ノッ
ト・ア・ナンバー、無限、異常値およびゼロを含む他の
特殊ケースも検出され、この情報は、利用者には見えな
いが、手動小数点ユニット２０１の異なるパイプライン
段を通してデータに後続する。FIG. 7 shows the basic pipeline for the floating point unit 201. The fetch stage is the same as the above-described integer operation fetch stage. During the unpacking stage of a floating point instruction, all the data necessary to start a floating point operation arrives, including the source operand, opcode, precision and destination address. The two source operands are read from the register file. The operands are then unpacked into the sign, exponent, and mantissa fields, and special case detection is performed. Input exceptions are detected in this cycle. Moreover, input exceptions are passed through the floating point unit 201 and signaled in the same cycle as single precision output exceptions. Other special cases, including signaling not-a-number, quiet not-a-number, infinity, outliers and zero are also detected, and this information is not visible to the user. , Follow the data through different pipeline stages of the manual decimal point unit 201.

【００４０】すべての計算はオペレート段で行われる。
命令の種類によるが、オペレート段では数サイクル必要
になることもある。All calculations are performed in the operating stage.
Depending on the type of instruction, the operating stage may require several cycles.

【００４１】浮動小数点ユニット２０１の結果が決定さ
れると、この浮動小数点演算に関する個々の情報のうち
に、浮動小数点ステータスレジスタに記録されるものも
ある。浮動小数点命令はどれも浮動小数点ステータスレ
ジスタに一度だけ書き込まれる。When the result of the floating-point unit 201 is determined, some pieces of information on the floating-point operation are recorded in the floating-point status register. All floating point instructions are written only once to the floating point status register.

【００４２】図８は、ポストスクリプトのようなページ
記述言語で明記された文書を印刷しようとするとき一般
に実行されるステップを示している。印刷ファイル（入
力データファイル３０１）の受信に続くのが翻訳である
（処理ブロック３０２）。このステップでは、入力ポス
トスクリプトファイルがディスプレイリスト（データフ
ァイル３０３）と呼ばれる中間型式に翻訳変換される。
ディスプレイリスト３０３は、台形、フォント、イメー
ジなど記述ページを作成する低水準基本語のリストから
なる。次にディスプレイリストがレンダリングされる
（処理ブロック３０４）。ディスプレイリストの各要素
はこのステップで処理され、出力がページバッファ（デ
ータファイル３０５）としてバッファに書き込まれる。
ページバッファ３０５は特定の１色中心面に関して出力
イメージの一部を表現する。ページバッファ３０５で
は、各画素は一般に８ビットで表現される。ディスプレ
イリスト３０５の要素すべてが処理されると、ページバ
ッファ３０５は出力イメージを８ビットフォーマットで
有する。次にページバッファがスクリーニングされる
（処理ブロック３０６）。印刷装置が支持する解像度は
画素あたり１から８ビットの間のどれであっても良い。
レンダリングステップ３０４で展開されたページバッフ
ァ３０５は、プリンタが支持する解像度に変換されなけ
ればならない。こうして変換されたデータはデバイスイ
メージと呼ばれる。ページバッファ３０５内の各画素は
それぞれ対応のデバイス画素値に変換されなければなら
ない。例えば、４ビットデバイス画素の場合、ページバ
ッファ３０５の各画素は４ビット値に変換されなければ
ならない。このスクリーニングと呼ばれる処理の結果、
スクリーニング済みページバッファ（データファイル３
０７）を生じる。次に印刷となる（処理ブロック３０
８）。スクリーニング済みページバッファ３０７の各画
素は用紙に印刷される。この処理はシアン、イエロー、
マゼンタおよびブラックの色中心面すべてに関して繰り
返される。FIG. 8 shows the steps commonly performed when trying to print a document specified in a page description language such as PostScript. Following receipt of the print file (input data file 301) is translation (processing block 302). In this step, the input PostScript file is translated and converted into an intermediate format called a display list (data file 303).
The display list 303 includes a list of low-level basic words for creating a description page such as trapezoids, fonts, and images. Next, the display list is rendered (processing block 304). Each element of the display list is processed in this step and the output is written to the buffer as a page buffer (data file 305).
The page buffer 305 represents a part of the output image with respect to a specific one color center plane. In the page buffer 305, each pixel is generally represented by 8 bits. When all the elements of the display list 305 have been processed, the page buffer 305 has the output image in an 8-bit format. Next, the page buffer is screened (processing block 306). The resolution supported by the printing device can be anywhere between 1 and 8 bits per pixel.
The page buffer 305 expanded in the rendering step 304 must be converted to a resolution supported by the printer. The data thus converted is called a device image. Each pixel in page buffer 305 must be converted to a corresponding device pixel value. For example, for a 4-bit device pixel, each pixel in page buffer 305 must be converted to a 4-bit value. As a result of a process called this screening,
Screened page buffer (data file 3
07). Next, printing is performed (processing block 30).
8). Each pixel of the screened page buffer 307 is printed on paper. This process is for cyan, yellow,
Repeated for all magenta and black color center planes.

【００４３】典型的なページの出力における各ページは
８インチｘ１１．５インチである。印刷密度がインチあ
たり６００画素なら、そのページは３３００万画素を含
む。各画素がスクリーニングされる必要がある。１画素
のスクリーニングにＴ時間掛かるとすると、特定の１色
中心面に関して１ページ全部をスクリーニングするには
合計３３００万Ｔ時間となる。この方法に伴う問題は、
値が０の画素、すなわち、レンダリングモジュールの出
力ではない画素もまたスクリーニングされるということ
である。典型的なページでは、有効画素率は全画素数の
数分の１にすぎない。そういうわけで多くの画素は０値
を持つ。表１は様々なページ型式について利用印刷領域
の百分率の評価を表にしたものである。Each page in a typical page output is 8 inches by 11.5 inches. If the print density is 600 pixels per inch, the page contains 33 million pixels. Each pixel needs to be screened. Assuming that it takes T time to screen one pixel, it takes a total of 33 million T hours to screen an entire page for a specific one color center plane. The problem with this method is that
Pixels with a value of 0, that is, pixels that are not the output of the rendering module, are also screened. In a typical page, the effective pixel rate is only a fraction of the total number of pixels. That is why many pixels have a zero value. Table 1 tabulates the evaluation of the percentage of available print area for various page types.

【００４４】[0044]

【表１】 [Table 1]

【００４５】ページの４０％だけがレンダリングモジュ
ールによって書かれているとすると、ページの６０％が
不必要なスクリーニングを受ける。これは、無用のスク
リーニングに費やされたものの合計が３３００万Ｔ単位
の６０％、つまり１９００Ｔ単位になるということであ
る。テキストページの場合、ページの約３０％だけが印
刷領域になる。こうしてテキストページについてはスク
リーニング時間の７０％がブランク領域に費やされて無
駄になる。グラフィックおよびイメージ情報を有するペ
ージの場合の潜在的な利得は小さくなるが、なおかなり
のものである。Assuming that only 40% of the pages are written by the rendering module, 60% of the pages undergo unnecessary screening. This means that the total spent on unnecessary screening is 60% of 33 million T units, or 1900 T units. For a text page, only about 30% of the page is the print area. Thus, for text pages, 70% of the screening time is wasted in the blank area. The potential gain for pages with graphic and image information is small, but still significant.

【００４６】本発明の方法はこの損失を克服する。本発
明は２つの方法のうちの１つによってページにおけるブ
ランク領域と印刷領域を識別する。第１の方法はディス
プレイリスト要素のバウンディングボックス内の領域だ
けをスクリーニングする。第２の方法は印刷画素を有す
る走査線を識別する。The method of the present invention overcomes this loss. The present invention identifies blank areas and printed areas on a page in one of two ways. The first method screens only the area within the bounding box of the display list element. A second method identifies scan lines that have print pixels.

【００４７】図９は、有効およびブランク印刷領域を識
別するバウンディングボックス法の応用例を示してい
る。各レンダリングモジュール４０１はレンダリング対
象を包囲するバウンディングボックスを用意する。例え
ば、台形要素を処理するレンダリングモジュール４０１
はページバッファに書き込まれている台形を囲むバウン
ディングボックス４０３を用意する。同様に、フォント
レンダリングモジュール４０１もレンダリングされたフ
ォントの入力文字に対してバウンディングボックス４０
５を用意する。FIG. 9 shows an application example of the bounding box method for identifying valid and blank print areas. Each rendering module 401 prepares a bounding box surrounding a rendering target. For example, a rendering module 401 for processing a trapezoidal element
Prepares a bounding box 403 surrounding the trapezoid written in the page buffer. Similarly, the font rendering module 401 also performs bounding box 40 on the input characters of the rendered font.
Prepare 5

【００４８】各レンダリングモジュール４０１の出力
は、ページバッファ内のレンダリング済み要素にそのレ
ンダリング済み要素を包含するバウンディングボックス
のパラメータを加えたものになる。ディスプレイリスト
が処理されると、こうしたバウンディングボックスのリ
ストがスクリーニングモジュール４０７に対して与えら
れる。スクリーニングモジュール４０７は各バウンディ
ングボックス４０３および４０５を考慮する。スクリー
ニングモジュール４０７はバウンディングボックス内の
画素だけをスクリーニングし、出力を印刷オペレーショ
ン４１１のための４ビット出力ページバッファ４０９内
に書き込む。The output of each rendering module 401 is the rendered element in the page buffer plus the parameters of the bounding box containing the rendered element. When the display list is processed, a list of such bounding boxes is provided to the screening module 407. Screening module 407 considers each bounding box 403 and 405. Screening module 407 only screens the pixels in the bounding box and writes the output into a 4-bit output page buffer 409 for print operation 411.

【００４９】図１０は有効およびブランク印刷領域を識
別する走査線法の応用例を示している。個々のモジュー
ルに対してバウンディングボックスを用意し、各レンダ
リング対象に対して個々のバウンディングボックスをス
クリーニングすることには問題があり得る。複雑な図形
に対しては、小さな重複バウンディングボックスが多く
できることがある。飾り付きのテキストもまた重複バウ
ンディングボックスを生じるかもしれない。その結果バ
ウンディングボックス法では多くの領域を消去できなく
なることがある。さらにまた、殆どのスクリーニング手
段は完全な１本の走査線のように長く連続したデータに
対して作用するときに効果的である。このような場合、
バウンディングボックス法は効果が薄くなることがあ
る。FIG. 10 shows an application example of the scanning line method for identifying the effective and blank print areas. Providing a bounding box for each module and screening the individual bounding box for each rendering target can be problematic. For complex figures, there may be many small overlapping bounding boxes. Decorated text may also result in duplicate bounding boxes. As a result, many areas may not be able to be erased by the bounding box method. Furthermore, most screening means are effective when operating on long, continuous data, such as a complete scan line. In such a case,
The bounding box method may be less effective.

【００５０】走査線法は、有効画素を有する画像におけ
る走査線全部を、しかし、走査線のみをスクリーニング
することを許す。レンダリング対象と交差する走査線の
みがスクリーニングされる。配列４１３のようなデータ
構造が、走査線がスクリーニングされるべきかどうかを
指示する。０値は走査線が印刷予定にないことを意味
し、１値は印刷予定であることを意味する。走査線法で
は、レンダリングモジュールによって１ページ全部がレ
ンダリングされた後に２つの出力が存在する。第１出力
はレンダリング済みモジュールすべてを含むレンダリン
グ済みページである。このレンダリング済みページの各
画素は８ビットである。第２出力は当該ページ内の走査
線数に等しい要素数を持つ走査線配列である。ここでの
各要素は、その走査線がスクリーニングを必要としてい
るかどうかを指示する１または０を有する。The scan line method allows to screen all scan lines in an image with valid pixels, but only scan lines. Only scan lines that intersect with the object to be rendered are screened. A data structure such as array 413 indicates whether scan lines are to be screened. A value of 0 indicates that the scan line is not scheduled to be printed, and a value of 1 indicates that the scan line is scheduled to be printed. In the scanline method, there are two outputs after the entire page has been rendered by the rendering module. The first output is a rendered page that contains all the rendered modules. Each pixel of this rendered page is 8 bits. The second output is a scanning line array having the number of elements equal to the number of scanning lines in the page. Each element here has a 1 or 0 indicating whether the scan line requires screening.

【００５１】図１０に示すページ例を考える。このペー
ジは線１０で始まり線１５で終わる台形と、線１４で始
まり線３１で終わるレンダリング済みフォント内の文字
を有する。走査線配列における全要素は０で初期化され
る。レンダリングが進行するにしたがって、レンダリン
グモジュール５０１はこの走査線配列に１を書き込むの
であるが、その場所は、スクリーニングの必要な線に対
応した、対象がレンダリングされた位置である。本例で
は、こうして走査線配列は、走査線０から９に対しては
０を、走査線１０から３１に対しては１を、そして走査
線３２以上に対しては０を有する。スクリーニングモジ
ュール５０３はこれらの入力を受信し、走査線配列が１
である行、すなわち、走査線１０から３１のみをスクリ
ーニングする。スクリーニングされた走査線は印刷オペ
レーション５０３で印刷される。Consider a page example shown in FIG. This page has a trapezoid starting at line 10 and ending at line 15, and characters in the rendered font starting at line 14 and ending at line 31. All elements in the scan line array are initialized to zero. As rendering progresses, the rendering module 501 writes 1 to this scan line array, at the location where the object was rendered, corresponding to the line that needs to be screened. In the present example, the scan line array thus has 0 for scan lines 0 through 9, 1 for scan lines 10 through 31, and 0 for scan lines 32 and above. Screening module 503 receives these inputs and scan line array 1
, Ie, only the scan lines 10 to 31 are screened. The screened scan lines are printed in a print operation 503.

【００５２】この手段は単純である。レンダリングモジ
ュールの手段とスクリーニング手段に少しの変更が必要
なだけである。本方法は多くの空走査線の存在するテキ
ストイメージに対して非常に有用となる。空でない走査
線のみがスクリーニングされるので、時間の節約は相当
なものとなる。This measure is simple. Only minor changes are required in the means of the rendering module and the screening means. The method is very useful for text images where there are many empty scan lines. The time savings are substantial because only non-empty scan lines are screened.

【００５３】図１１は、従来のスクリーニングで一般に
用いられている３次元ルックアップテーブルの構造を示
している。Ｘ、Ｙ座標で示された画素位置はＭｘＮ選好
行列内にインデクシングされたモジューロである。こう
して画素Ｘ座標はＸモジューロＭにある選好行列の一行
を選択する。同様にして画素Ｙ座標はＹモジューロＮに
ある選好行列の一列を選択する。FIG. 11 shows the structure of a three-dimensional lookup table generally used in conventional screening. The pixel locations indicated by the X, Y coordinates are modulo indexed into the MxN preference matrix. Thus, the pixel X coordinate selects one row of the preference matrix at X modulo M. Similarly, the pixel Y coordinate selects one column of the preference matrix in the Y modulo N.

【００５４】図１２は４ｘ４選好行列の一例である。選
好行列内のアクセス位置にあるデータはルックアップテ
ーブル集合のうちの１つを指す。選好行列内の各要素は
ルックアップテーブル数を示す。選好行列において要素
（０、０）を示す画素は第１ルックアップテーブルＬＵ
Ｔ（０）を使用する。選好行列において（０、１）を示
す画素はＬＵＴ[１]を使用する。選好行列において
（０、２）を示す画素はルックアップテーブル[１]を使
用する。選好行列において（０、３）を示す画素はＬＵ
Ｔ[２]を使用する。このように選好行列が入力画像の画
素の対するイメージスクリーニングに用いるルックアッ
プテーブルを指定する。同様に、（１、０）から（１、
３）、（２、０）から（２、３）および（３、０）から
（３、３）に基づいて画素に対してルックアップテーブ
ルが算出される。図１２の４ｘ４選好行列例では、
（Ｘ、Ｙ）にある所与の画素に対して、（Ｘモジューロ
４、Ｙモジューロ４）にある選好行列要素が使用される
ルックアップテーブルを選択する。こうして（０、１）
を示す（０、５）にある画素に対するルックアップテー
ブルはＬＵＴ[１]となる。（３、０）を示す（７、８）
にある画素に対するルックアップテーブルはＬＵＴ[０]
である。こうして入力画素位置は、適当なルックアップ
テーブルを選択するために選好行列上に写像される。FIG. 12 is an example of a 4 × 4 preference matrix. The data at the access position in the preference matrix points to one of the look-up table sets. Each element in the preference matrix indicates the number of lookup tables. Pixels indicating the element (0, 0) in the preference matrix are stored in the first lookup table LU.
Use T (0). Pixels indicating (0, 1) in the preference matrix use LUT [1]. Pixels indicating (0, 2) in the preference matrix use the lookup table [1]. Pixels indicating (0, 3) in the preference matrix are LU
Use T [2]. Thus, the preference matrix specifies a look-up table used for image screening of pixels of the input image. Similarly, from (1, 0) to (1,
3) Lookup tables are calculated for pixels based on (2,0) to (2,3) and (3,0) to (3,3). In the example of the 4 × 4 preference matrix of FIG.
For a given pixel at (X, Y), select a lookup table in which the preference matrix element at (X modulo 4, Y modulo 4) is used. Thus (0, 1)
Is a LUT [1] for the pixel at (0,5) indicating Indicates (3,0) (7,8)
The lookup table for the pixel at is LUT [0]
It is. Thus, the input pixel locations are mapped onto a preference matrix to select an appropriate look-up table.

【００５５】図１１に戻って、インデクシング用モジュ
ーロがルックアップテーブル集合のうちの１つを選択す
る。画素グレースケール値はこの選択されたルックアッ
プテーブル内のインデックスとなる。画素がｂビットを
有する場合、各ルックアップテーブルは２^d個のエント
リを有する。各エントリは、ｃビットサイズの対応スク
リーニング済み出力画素を有する印刷装置のダイナミッ
クレンジ内でｃビットのデータを有する。こうして
（ｘ、ｙ）にある画素のスクリーニング値Ｖは、Returning to FIG. 11, the indexing modulo selects one of the look-up table sets. The pixel grayscale value will be an index in the selected look-up table. If the pixel has b bits, each look-up table has 2 ^d entries. Each entry has c bits of data within the dynamic range of the printing device having a corresponding screened output pixel of c bits in size. Thus, the screening value V of the pixel at (x, y) is

【数１】 V=LUT[preference＿matrix[x%m][y%n][image[x][y]] によって与えられる。V = LUT [preference_matrix [x% m] [y% n] [image [x] [y]]

【００５６】この従来技術は利用可能チップ上メモリに
対していくつかの要求をする。選好行列は最大５１２の
行サイズを持つ。このために、次の区間でアドレス指定
される奇数選好行列行寸法を処理するメモリを含めてチ
ップ上メモリに１Ｋバイトの領域が必要となる。プロセ
ッサ集積回路は入出力用にバッファを必要とする。入出
力用に２つのバッファを用いて入出力バッファに２Ｋバ
イトを割り当てると、４Ｋバイトのメモリが必要とな
る。上述のマルチプロセッサ集積回路１００を使用する
場合、転送要求の定義用パラメータ空間として約０．５
Ｋバイトが必要である。これらのメモリ必要条件を合計
すると約５．５Ｋとなる。余白のマルチプロセッサ集積
回路１００のデジタルイメージ・グラフィックプロセッ
サ７１、７２、７３および７３を使用する場合、これら
のメモリ必要条件がルックアップテーブルに対して残す
のは約２Ｋバイトだけとなる。このことは、最大８つの
ルックアップテーブルがデジタルイメージ・グラフィッ
クプロセッサ７１、７２、７３および７３のチップ上メ
モリに存在可能であるということを意味する。This prior art places some demands on available on-chip memory. The preference matrix has a maximum of 512 row sizes. This requires a 1K byte area in the on-chip memory, including the memory that processes the odd preference matrix row dimensions addressed in the next interval. Processor integrated circuits require buffers for input and output. When 2K bytes are allocated to the input / output buffer using two buffers for input / output, 4K bytes of memory are required. When the above-described multiprocessor integrated circuit 100 is used, about 0.5 is used as a parameter space for defining a transfer request.
K bytes are required. These memory requirements add up to about 5.5K. When using the digital image graphics processors 71, 72, 73 and 73 of the multiprocessor integrated circuit 100 in the margin, these memory requirements leave only about 2K bytes for the lookup table. This means that up to eight look-up tables can exist in the on-chip memory of digital image and graphics processors 71, 72, 73 and 73.

【００５７】多くの実用的な実施例は印刷装置イメージ
において４ビットデータを使用する。殆どのデータプロ
セッサは８ビットあるいは１ビットの最小アドレス可能
単位を提供する。こうして２つの４ビット画素が同時に
処理され、１つの１バイト出力にパックされる。選好行
列が行あたり偶数画素数を有するなら、このことに問題
はない。行寸法６の選好行列を考えると、画素０および
１のスクリーニング済み出力が出力アドレス０に書き込
まれ、画素２および３が出力アドレス１に書き込まれ、
画素４および５が出力アドレス２に書き込まれる。Many practical embodiments use 4-bit data in a printing device image. Most data processors provide a minimum addressable unit of 8 or 1 bit. Thus, two 4-bit pixels are processed simultaneously and packed into one 1-byte output. This is not a problem if the preference matrix has an even number of pixels per row. Given a preference matrix of row size 6, the screened outputs of pixels 0 and 1 are written to output address 0, pixels 2 and 3 are written to output address 1,
Pixels 4 and 5 are written to output address 2.

【００５８】図１３は、奇数要素数の行寸法を持つ選好
行列の場合に起こる、従来技術の問題を示している。本
例では選好行列が３という行寸法を持つ。ニブルをバイ
トにパックするとき、奇数要素数に伴う問題が生じる。
画素０および１のスクリーン済み出力は出力アドレス０
に書き込まれる。画素２を処理すると、それは１つの４
ビット出力を生じる。出力メモリがバイトアドレス指定
可能であって４ビットアドレス指定可能ではないため、
この４ビット出力は出力メモリに単独で書き込むことは
できない。この特別な場合には、読取り・変更・書込み
オペレーションのための特別処理が必要になり、それに
よって処理能力は低くなる。FIG. 13 illustrates a problem of the prior art that occurs in the case of a preference matrix having a row size of an odd number of elements. In this example, the preference matrix has a row size of three. When packing nibbles into bytes, a problem arises with the odd number of elements.
The screened output of pixels 0 and 1 is output address 0
Is written to. Processing pixel 2 results in one 4
Produces a bit output. Because the output memory is byte addressable and not 4-bit addressable,
This 4-bit output cannot be written alone to the output memory. This special case requires special handling for read / modify / write operations, thereby reducing throughput.

【００５９】図１４は、この問題を解決するために本発
明の提案する方法を概略的に示している。ルックアップ
テーブルのキャッシュがチップ上メモリに維持されてい
る。先に計算したように、マルチプロセッサ集積回路１
００には８つのルックアップテーブルが常にチップ上メ
モリに維持され得る。このキャッシングを簡単にするた
めに、選好行列行は選好セグメントに分割される。これ
によって、ルックアップテーブルの最大数に関する制限
が除去される。FIG. 14 schematically shows a method proposed by the present invention to solve this problem. A look-up table cache is maintained in on-chip memory. As previously calculated, the multiprocessor integrated circuit 1
At 00 eight look-up tables can always be maintained in on-chip memory. To simplify this caching, the preference matrix rows are divided into preference segments. This removes the limit on the maximum number of lookup tables.

【００６０】入力画像は一度に１走査線処理される。選
好行列の各行はそれぞれ８つの要素の選好セグメントに
分割される。図１４の例に示すように、行寸法１６の選
好行列は、要素０から７を有する選好セグメント０と、
要素８から１５を有する選好セグメント１に分割され
る。現在の入力行がこれらの選好セグメントに関して処
理される。第１選好セグメントに属するルックアップテ
ーブルがチップ上メモリに招来され、このセグメントに
対応するすべての画素が処理され、出力される。処理は
順に残りのセグメントに対して繰り返される。注意すべ
きは、デジタルイメージ・グラフィックプロセッサ７
１、７２、７３および７４に関係づけられたデータメモ
リのメモリ編成は、これらの選好セグメントの単位でデ
ータ転送を許可するということである。The input image is processed one scanning line at a time. Each row of the preference matrix is divided into eight element preference segments. As shown in the example of FIG. 14, a preference matrix of row size 16 has a preference segment 0 having elements 0 to 7,
It is divided into preference segments 1 having elements 8 to 15. The current input line is processed for these preference segments. A look-up table belonging to the first preference segment is brought to the on-chip memory, and all pixels corresponding to this segment are processed and output. The process is repeated for the remaining segments in turn. Note that the digital image and graphics processor 7
The memory organization of the data memory associated with 1, 72, 73 and 74 is to permit data transfer in units of these preference segments.

【００６１】セグメント化されない処理では、ルックア
ップテーブルがチップ上に転送されるのを待って多くの
時間を無駄にするか、あるいは、別個のルックアップテ
ーブルすべてをチップ上に装備する必要があるかのどち
らかであった。本発明の選好セグメント法は、これらの
欠点はどちらも無く選好セグメントをキャッシングする
ことによってスクリーニングを可能にする。Does non-segmented processing waste a lot of time waiting for the look-up tables to be transferred on-chip, or do all separate look-up tables need to be implemented on-chip? Was either. The preferred segment method of the present invention allows screening by caching preferred segments without either of these disadvantages.

【００６２】処理を簡単にするため、ルックアップテー
ブルの各エントリは８ビットを有する。選好セグメント
が処理されると、８つの入力要素が４バイトにスクリー
ニングされる。出力バッファはこうした４バイトセグメ
ントによって作成される。これによって転送コントロー
ラ８０の帯域が５０％に減らされる。これは、８つのル
ックアップテーブルだけがチップ上メモリに収容可能で
あるからでもある。もし１６個のルックアップテーブル
エントリが４ビットエントリなら、１６個のルックアッ
プテーブルがキャッシュされる。これによって１６個の
要素の選好セグメントが許可され、転送コントローラ８
０の１００％利用を与える８バイト出力を生じる。To simplify the processing, each entry of the look-up table has 8 bits. As the preference segment is processed, eight input elements are screened into four bytes. The output buffer is created by such a 4-byte segment. This reduces the bandwidth of the transfer controller 80 to 50%. This is because only eight look-up tables can be stored in the on-chip memory. If the 16 look-up table entries are 4-bit entries, 16 look-up tables are cached. This allows a preference segment of 16 elements and the transfer controller 8
Produces an 8-byte output giving 100% utilization of 0.

【００６３】図１５は奇数行寸法を持つ選好行列を処理
する本発明方法を示している。選好行列行寸法が奇数サ
イズである場合、選好行列は２倍にされる。この結果、
偶数サイズになる。図１５に示すように、それぞれ８ビ
ットの６つの入力画素は６つの４ビットニブルにスクリ
ーニングされ、６つのバイトワードにパックされる。選
好行列の２倍化は、選好行列をその大きさで複製するこ
とによって達成される。これによってタイルサイズは２
倍になるが、そうしたタイルそれぞれが２つの同一半分
から作成されているということである。２倍化された方
向の画素寸法はモジューロＭではなくてモジューロ２Ｍ
によってインデクシングされる。この２倍化は、選好テ
ーブルを格納するためにより多くの場所を必要とする。
しかしながら、この２倍化は計算の複雑さを減少させ、
計算を一様なものにする。FIG. 15 illustrates the method of the present invention for processing a preference matrix having an odd row size. If the preference matrix row size is odd size, the preference matrix is doubled. As a result,
It becomes even size. As shown in FIG. 15, six input pixels of eight bits each are screened into six four-bit nibbles and packed into six byte words. Doubling the preference matrix is achieved by duplicating the preference matrix at its size. This makes the tile size 2
Double that, each such tile is made from two identical halves. The pixel size in the doubled direction is not modulo M but modulo 2M
Is indexed by This doubling requires more space to store the preference table.
However, this doubling reduces computational complexity,
Make the calculations uniform.

【００６４】本方法の簡単な説明は以下疑似コードの形
式で与えられる。本例によれば、入力バッファサイズは
２Ｋバイトであり、出力バッファサイズは１Ｋバイトで
ある。／／ｒｏｗ＝０から画像の高さまで一度に画像の１行を処理 for row=0 to 画像の高さ pref＿row＿num=image＿y% 選好行列高さ選好行列[pref＿row＿num]を転送選好行列の幅が奇数ならチップ上バッファを複製／／選好行が、それぞれ８エントリの長さの選好セグメントに分割される。／／入力は選好セグメントに関して処理される。pref＿countは選好行における／／こうした選好セグメントの整数数を示す。 pref＿count=pref＿row＿size/8 for i=0 to pref＿count-1 preference＿segment[i]を入手 LUTBLOCK[i] を入手 input preference＿segment[i]に応じたブロックを入手入力をスクリーン／／２バイトを入力から読み込み、４ビット値にスクリーニングする／／これらの値を結合して、８ビット値を形成し、出力バッファに書き込む for (m=0; m<PAGE＿WIDTH; m+=8) for (k=0; k<8; k+2) *output++ = (LUT[k][input[m=k]]<<4) | LUT[k+1][input[m+k+1] end for PAGE＿WIDTH/2 サイズの出力を転送 end for end forA brief description of the method is given below in the form of pseudo code. According to this example, the input buffer size is 2K bytes and the output buffer size is 1K bytes. // Process one row of image at a time from row = 0 to image height for row = 0 to image height pref_row_num = image_y% Preference matrix height Transfer preference matrix [pref_row_num] Chip if preference matrix width is odd Duplicate upper buffer // The preference line is divided into preference segments, each 8 entries long. // Inputs are processed for preference segments. pref_count indicates // an integer number of such preference segments in the preference bank. pref_count = pref_row_size / 8 for i = 0 to pref_count-1 Obtain preference_segment [i] Obtain LUTBLOCK [i] input Obtain block corresponding to preference_segment [i] Input / screen Read 2 bytes from input, read 4 bits Screen for value // combine these values to form an 8-bit value and write to output buffer for (m = 0; m <PAGE_WIDTH; m + = 8) for (k = 0; k <8; k +2) * output ++ = (LUT [k] [input [m = k]] << 4) | LUT [k + 1] [input [m + k + 1] end for PAGE_WIDTH / 2 size output end for end for

【００６５】これは実装（implementation）であり、デ
ジタルイメージ・グラフィックプロセッサ７１、７２、
７３及び７４の１つのリソースのみを用い、他のプロセ
ッサのリソースは食わない。スクリーニングはこれらの
プロセッサの１つに限定され、他のプロセッサが他のオ
ペレーションを独立実行できるようにする。This is an implementation, the digital image and graphics processors 71, 72,
Only one resource 73 and 74 is used, and the other processor's resources are not consumed. Screening is limited to one of these processors, allowing other processors to independently perform other operations.

【００６６】ルックアップテーブル、入出力バッファ、
選好行列行をチップ上メモリ内に適当に配置することに
よって、二重緩衝動作機構をルックアップテーブルや選
好行列行に拡張できる。これによって、次の選好セグメ
ントが処理されるときにルックアップテーブルのロード
を待つこと、および、次の走査線を処理するときに選好
行列行のロードを待つことが回避される。Lookup table, input / output buffer,
By properly locating the preference matrix rows in the on-chip memory, the double buffering mechanism can be extended to look-up tables and preference matrix rows. This avoids waiting for the lookup table to load when the next preference segment is processed and waiting for the preference matrix row to load when processing the next scanline.

【００６７】スクリーニングされた出力値は１ニブル
（４ビット）であるという事実と、メモリ配置はバイト
（８ビット）アドレス指定可能であるという制限から、
スクリーニングの核心方法は一度に２画素を処理する必
要がある。したがって従来技術によるスクリーニングの
核心方法は次のステップを有している。ステップ１： input＿pointerが示す画素を４ビットに
スクリーニングし、第１一時メモリ位置に保持する。ステップ２： input＿pixel＿pointerをインクリメン
トする。ステップ３： pref＿pointerをインクリメントする。ステップ４： input＿pointerが示す画素を４ビットに
スクリーニングし、第２一時メモリ位置に保持する。ステップ５： input＿pointerをインクリメントする。ステップ６： pref＿pointerをインクリメントする。ステップ７：第１および第２一時ニブルを８ビットに
パックする。ステップ８：パックされた値をoutput＿pointerが示
す位置に格納する。ステップ９： output＿pointerをインクリメントす
る。Due to the fact that the screened output value is one nibble (4 bits) and the limitation that the memory arrangement is byte (8 bits) addressable,
The core method of screening requires processing two pixels at a time. Therefore, the core method of screening according to the prior art has the following steps. Step 1: Screen the pixel pointed to by input_pointer to 4 bits and hold in the first temporary memory location. Step 2: input_pixel_pointer is incremented. Step 3: Increment pref_pointer. Step 4: Screen the pixel pointed to by input_pointer to 4 bits and hold in the second temporary memory location. Step 5: Increment input_pointer. Step 6: Increment pref_pointer. Step 7: Pack the first and second temporary nibbles into 8 bits. Step 8: Store the packed value in the position indicated by output_pointer. Step 9: Increment output_pointer.

【００６８】一行全部の画素に対して実行するようにル
ープが設定され、行の長さがＬに等しいとき、ループ総
数はＬ／２となる。選好行列ポインタがインクリメント
され、各画素対に対してループ範囲内で照合されるとき
には、選好行列のタイルサイズは偶数でなければなら
い。Ｍが偶数の場合、サイズＭの配列を回り込む単一ポ
インタ（選好行列行を示す）はループにおいてそのまま
使用可能である。奇数の場合に同じ概念を拡張するに
は、走査線をモジューロ２Ｍでタイルする必要がある。
その結果、選好行列ポインタの照合を各画素対に対して
行うことができ、一度に２画素をスクリーニングすると
いう核心方法はここでも使用可能になる。A loop is set to be executed for all pixels in one row, and when the length of the row is equal to L, the total number of loops is L / 2. When the preference matrix pointer is incremented and matched within the loop for each pixel pair, the tile size of the preference matrix must be even. If M is even, a single pointer (indicating a preference matrix row) wrapping around an array of size M can be used as is in the loop. To extend the same concept in the odd case, the scan lines need to be tiled modulo 2M.
As a result, the matching of the preference matrix pointer can be performed for each pixel pair, and the core method of screening two pixels at a time can be used here.

【００６９】図１６は奇数Ｍを有する選好行列を用いて
スクリーニングするためにルックアップテーブルへイン
デクシングする従来技術方法を概略的に示している。奇
数Ｍの場合に従来技術方法によって一行の画素をスクリ
ーニングするには、画素の総数に対して行われる外側ル
ープを設定する。各画素対に対してこのループ範囲内に
おいて、循環ポインタが２Ｍに達すると常に循環ポイン
タを配列の先頭にリセットするためのプログラム照合を
行う。図１６に示すように、loop＿pref＿pointerがpre
f＿pointer＿endに達すると、loop＿pref＿pointerはpr
ef＿pointer＿startにリセットされる。FIG. 16 schematically illustrates a prior art method of indexing into a look-up table for screening using a preference matrix having an odd number M. To screen a row of pixels by the prior art method for odd M, an outer loop is performed that is performed on the total number of pixels. Within this loop range for each pixel pair, whenever the circular pointer reaches 2M, program matching is performed to reset the circular pointer to the beginning of the array. As shown in FIG. 16, loop_pref_pointer is
When f_pointer_end is reached, loop_pref_pointer becomes pr
Reset to ef_pointer_start.

【００７０】従来技術スクリーニングループは次のステ
ップを有する。ステップ１： loop＿pref＿pointerをpref＿pointer＿
startに設定する。ステップ２： i=1 to i 1/2 に対してステップ３お
よび４を繰り返す。ステップ３： [スクリーニングの核心方法の全ステッ
プ] ステップ４： loop＿pref＿pointerがpref＿pointer＿
endに等しいかどうかを照合する。真ならば、ポインタ
を配列の先頭にリセット、すなわち、loop＿pref＿poin
terをpref＿pointer＿startに設定する。さもなけれ
ば、ループを続行する。奇数Ｍの選好行列の場合にも同
一方法が拡張される。その場合、走査線がモジュラスＭ
のタイルに分割され、Ｍ画素ごとに、すなわち、ポイン
タがpref＿end＿pointerに達するたびにループをリセッ
トする。注意すべきは、pref＿end＿pointerがpref＿po
inter＿start + M - 1に設定されることである。スクリ
ーニングループは奇数Ｍの場合と同じステップを有す
る。The prior art screening loop has the following steps. Step 1: loop_pref_pointer is converted to pref_pointer_
Set to start. Step 2: Repeat steps 3 and 4 for i = 1 to i 1/2. Step 3: [All steps of the core method of screening] Step 4: loop_pref_pointer becomes pref_pointer_
Checks if it is equal to end. If true, reset the pointer to the beginning of the array, ie, loop_pref_poin
ter is set to pref_pointer_start. Otherwise, continue the loop. The same method is extended for odd M preference matrices. In that case, the scanning line has a modulus M
And the loop is reset every M pixels, that is, each time the pointer reaches pref_end_pointer. Note that pref_end_pointer is pref_po
inter_start + M−1. The screening loop has the same steps as the odd M case.

【００７１】上述の従来技術方法は、選好行列モジュー
ロ照合がループ内で行われるので、処理能力が悪い。こ
の従来技術方法は、また、奇数Ｍの場合、選好行列をチ
ップ上に格納するのに２Ｍバイトサイズの配列を必要と
する。The prior art method described above has poor processing power because the preference matrix modulo matching is performed in a loop. This prior art method also requires a 2 Mbyte size array to store the preference matrix on chip for odd M.

【００７２】本発明の提案方法は、選好行列ポインタを
ループ内で照合しないことで上述の問題を軽減しようと
する。提案方法は、また、奇数Ｍの場合に選好行列のメ
モリ記憶装置必要条件も軽減する。本発明は従来技術と
同一のスクリーニング核心方法を使用する。本発明の提
案方法は、偶数Ｍの場合には選好行列行サイズＭで、奇
数Ｍの場合には２Ｍで走査線をタイリングし、外側ルー
プと内側ループを設定する。内側ループはスクリーニン
グ核心方法からなり、偶数Ｍの場合にはＭ／２個の画素
に対して、奇数Ｍの場合にはＭ個の画素に対して実行さ
れる。The proposed method of the present invention seeks to alleviate the above problem by not matching the preference matrix pointer in a loop. The proposed method also reduces the memory storage requirements of the preference matrix for odd M cases. The present invention uses the same screening core method as the prior art. In the proposed method of the present invention, the scanning line is tiled with the preference matrix row size M in the case of an even number M and 2M in the case of an odd number M, and an outer loop and an inner loop are set. The inner loop consists of a screening core method, which is performed on M / 2 pixels for even M and for M pixels for odd M.

【００７３】走査線がタイル境界で開始および終了され
ない場合、その走査線は３つの部分に分割される。これ
らは、１つのタイル（Ｍあるいは２Ｍ）境界までの開始
部分と、最後から２番目のタイル境界から走査線の最後
まで続く終了部分と、完全なタイルからなる中間部分と
である。開始および終了部分における画素をスクリーニ
ングするために部分的な内側ループを設定するのに対し
て、中間部分は外側および内側ループを用いて処理され
る。タイルサイズよりも小さな線長に対しては、部分的
な内側ループが使用される。奇数および偶数どちらのＭ
の場合にも内側および外側ループを持つという方法につ
いて以下に説明する。If a scan line does not start and end at a tile boundary, the scan line is divided into three parts. These are the start part to the boundary of one tile (M or 2M), the end part continuing from the penultimate tile boundary to the end of the scan line, and the middle part consisting of a complete tile. The middle part is processed using outer and inner loops, while setting a partial inner loop to screen for pixels at the start and end. For line lengths smaller than the tile size, a partial inner loop is used. Odd or even M
The method of having inner and outer loops in the case of is also described below.

【００７４】図１７は、奇数Ｍを持つ選好行列を用いて
スクリーニングするためにルックアップテーブルにイン
デクシングする本発明の方法を概略的に示している。提
案方法は選好行列配列内に２つのポインタを使用する。
選好行列はＭ＋１サイズの配列で格納される。この配列
の最初のエントリは、選好行列行のＭ番目の要素であ
る。この後には選好行列行のＭ個の要素が続く。走査線
は２Ｍモジュラスに分割され、内側ループは２つのルー
プに分割される。そのループの１つは１からＭ＋１の画
素に対して実施されるものであり、他方は１からＭ−１
の画素に対して実施されるものである。これら２つの内
側ループは、それぞれのエントリにおいて、それぞれＭ
＋１およびＭ−１の選好行列開始ポインタを使用する。
Ｍが奇数の場合、Ｍ＋１およびＭ−１は偶数であるの
で、（Ｍ＋１）／２対および（Ｍ−１）／２対の画素に
対して実施される内側ループは、やはり同一のスクリー
ニング核心方法を使用可能である。これらのループ内で
は、選好行列ポインタがインクリメントされるだけであ
る。ループの最後に、２つの選好行列ポインタがＭ＋１
あるいはＭ−１の選好行列配列の先頭にリセットされ
る。外側ループは走査線における２Ｍ個のタイルに対し
て実行される。FIG. 17 schematically illustrates the method of the present invention for indexing a look-up table for screening with a preference matrix having an odd number M. The proposed method uses two pointers in the preference matrix array.
The preference matrix is stored in an M + 1 size array. The first entry in this array is the Mth element of the preference matrix row. This is followed by M elements of the preference matrix row. The scan line is split into 2M modulus and the inner loop is split into two loops. One of the loops is performed on pixels from 1 to M + 1 and the other is from 1 to M-1
This is performed for the pixels of. These two inner loops have M in each entry
Use preference matrix start pointers of +1 and M-1.
If M is odd, then M + 1 and M-1 are even, so the inner loop performed on the (M + 1) / 2 and (M-1) / 2 pairs of pixels will still have the same screening core method. Can be used. Within these loops, the preference matrix pointer is only incremented. At the end of the loop, the two preference matrix pointers are M + 1
Alternatively, it is reset to the head of the preference matrix array of M-1. The outer loop is performed for 2M tiles in the scan line.

【００７５】スクリーニングループは次のステップを有
する。ステップ１：外側ループ総数を与える、処理すべきタ
イル数tile＿cnt = L/(2*M)を計算する。ステップ２： loop＿pref＿pointer1をpref＿pointer
＿M-1＿startに設定する。ステップ３： k=1 to k tile＿cnt に対してステッ
プ４から９を繰り返す。ステップ４： loop＿pref＿pointerをpref＿pointer＿
start＿M-1にリセットする。ステップ５： i=1 to i (M-1)/2に対してステップ６を
繰り返す。ステップ６： [スクリーニングの核心方法の全ステッ
プ] ステップ７： Loop＿pref＿pointerをpref＿pointer＿
M+1に設定する。ステップ８： i=1 to i (M+1)/2 に対してステップ９
を繰り返す。ステップ９： [スクリーニングの核心方法の全ステッ
プ]The screening loop has the following steps. Step 1: Calculate the number of tiles to be processed tile_cnt = L / (2 * M), giving the total number of outer loops. Step 2: loop_pref_pointer1 is converted to pref_pointer
Set to _M-1_start. Step 3: Repeat steps 4 to 9 for k = 1 to k tile_cnt. Step 4: loop_pref_pointer is converted to pref_pointer_
Reset to start_M-1. Step 5: Step 6 is repeated for i = 1 to i (M-1) / 2. Step 6: [All steps of the core method of screening] Step 7: Loop_pref_pointer is converted to pref_pointer_
Set to M + 1. Step 8: Step 9 for i = 1 to i (M + 1) / 2
repeat. Step 9: [All steps of the core screening method]

【００７６】走査線が２Ｍタイル境界で開始および終了
していない場合、その走査線の開始および終了部分は別
個に処理されることになる。それらの処理は、選好行列
ポインタが適当なＭ＋１あるいはＭ−１で始まる部分的
な内側ループだけ（外側ループは必要ない）を有する。
Ｍ＋１あるいはＭ−１のポインタの順序は２Ｍタイルの
どの部分でその走査線が開始しているかによる。If a scan line does not start and end at a 2M tile boundary, the start and end portions of that scan line will be processed separately. These processes have only partial inner loops (no outer loops needed) where the preference matrix pointer starts at the appropriate M + 1 or M-1.
The order of the M + 1 or M-1 pointers depends on where in the 2M tile the scan line starts.

【００７７】提案方法は、偶数のモジューロＭのタイル
に分割された走査線を持つ。２つのループがある。Ｍ／
２個の画素に対する内側ループは２画素スクリーニング
の核心方法を用いる。外側ループはスクリーニングされ
るべき走査線におけるタイル数に対して実施される。内
側ループ内でインクリメントされるポインタがあり、そ
のポインタは、外側ループのエントリにおいて選好行列
配列サイズＭの先頭を指示するとともに、各内側ループ
の終端において選好行列の先頭を指示するようにリセッ
トされる。The proposed method has a scan line divided into even modulo M tiles. There are two loops. M /
The inner loop for two pixels uses the core method of two-pixel screening. The outer loop is performed for the number of tiles in the scan line to be screened. There is a pointer which is incremented in the inner loop, and the pointer is reset to indicate the head of the preference matrix array size M in the entry of the outer loop and to indicate the head of the preference matrix at the end of each inner loop. .

【００７８】スクリーニングループは次のステップを有
する。ステップ１：処理すべきタイル数を計算する。tile＿
cnt = L/Mは外側ループのカウントを与える。ステップ２： loop＿pref＿pointerをpref＿pointer＿
startに設定する。ステップ３： k=1 to k tile＿cnt に対してステッ
プ４から６を繰り返す。ステップ４： loop＿pref＿pointerをpref＿pointer＿
startにリセットする。ステップ５： i=1 to i M/2に対してステップ６を繰
り返す。ステップ６： [スクリーニングの核心方法の全ステッ
プ]The screening loop has the following steps. Step 1: Calculate the number of tiles to be processed. tile_
cnt = L / M gives the count of the outer loop. Step 2: loop_pref_pointer is converted to pref_pointer_
Set to start. Step 3: Repeat steps 4 to 6 for k = 1 to k tile_cnt. Step 4: loop_pref_pointer is converted to pref_pointer_
Reset to start. Step 5: Repeat step 6 for i = 1 to i M / 2. Step 6: [All steps of the core method of screening]

【００７９】走査線がＭタイル境界で開始および終了し
ていない場合、その走査線の開始および終了部分は別個
に処理されることになる。それらの処理は、選好行列ポ
インタが開始エントリを示す部分的な内側ループだけ
（必要な外側ループは無い）を有する。If a scan line does not start and end at an M tile boundary, the start and end portions of that scan line will be processed separately. These processes have only a partial inner loop (no required outer loop) where the preference matrix pointer indicates the starting entry.

【００８０】以下に述べるのは、現行方法と比較して提
案方法の処理能力を計算するためになされた解析結果で
ある。比較は１画素処理あたりの命令で行う。この解析
における命令とは、算術あるいはアドレス指定オペレー
ションを意味する。本論題において使用される用語は、ＩＰＰ_CM＝従来技術方法を用いる画素あたりの全命令ＬＯＯＰ−ＩＰＰ_CM＝従来技術方法を用いてループによ
って実行される画素あたりの全命令ＬＳ_CM＝従来技術方法を用いる画素あたりのセットアッ
プ命令ＩＰＰ_PM＝本発明方法を用いる画素あたりの全命令ＬＯＯＰ−ＩＰＰ−Ｏ_PM＝本発明方法を用いてループに
よって実行される画素あたりの全命令（Ｍ奇数）ＬＯＯＰ−ＩＰＰ−Ｅ_PM＝本発明方法を用いてループに
よって実行される画素あたりの全命令（Ｍ偶数）ＬＳ_PM＝本発明方法を用いる画素あたりのセットアップ
命令上記用語はすべて長さＬの走査線および選好行列行サイ
ズＭの場合である。The following is an analysis performed to calculate the throughput of the proposed method as compared to the current method. The comparison is performed by an instruction per one pixel processing. Instructions in this analysis refer to arithmetic or addressing operations. The terms used in this subject are: IPP _CM = all instructions per pixel using the prior art method LOOP−IPP _CM = all instructions per pixel executed by loop using the prior art method LS _CM = prior art method Setup instructions per pixel used IPP _PM = all instructions per pixel using the method of the invention LOOP-IPP-O _PM = all instructions per pixel (M odd) executed by a loop using the method of the invention LOOP-IPP− E _PM = all instructions per pixel (M even) executed by a loop using the method of the invention LS _PM = setup instructions per pixel using the method of the invention All terms above are scan lines of length L and preference matrix rows. This is the case of size M.

【００８１】従来技術方法の場合、ＩＰＰ_CM＝ＬＯＯＰ−ＩＰＰ_CM＋ＬＳ_CM ただし、ＬＯＯＰ−ＩＰＰ_CM＝（１＋（Ｌ／２）＊１
１））／Ｌ、Ｌ／２は画素対数、そして、１画素対に対
するループ内の命令数は１１である。ＬＳ_CM＝５／Ｌただし、選好行列（ｘｍｏｄＭおよびｙｍｏ
ｄＮ）に関してポインタをセットアップするための命
令数は５である。In the case of the prior art method, IPP _CM = LOOP−IPP _CM + LS _CM where LOOP−IPP _CM = (1+ (L / 2) * 1
1)) / L and L / 2 are the number of pixel pairs, and the number of instructions in the loop for one pixel pair is 11. LS _CM = 5 / L where the preference matrix (x mod M and y mo
The number of instructions to set up the pointer for dN) is five.

【００８２】奇数Ｍの場合の本発明方法では、ＩＰＰ_PM＝ＬＯＯＰ−ＩＰＰ−Ｏ_PM＋ＬＳ_PM ただし、Ｌ＞＞２Ｍなら、ＬＯＯＰ−ＩＰＰ−Ｏ_PM＝（２＋（Ｌ
／Ｍ）＋（Ｌ＊９／２））／ＬＬ＜２Ｍなら、ＬＯＯＰ−ＩＰＰ−Ｏ_PM＝（２＋Ｌ＋
（Ｌ＊９／２））／Ｌただし、１画素対に対する内側ループ内の命令数は１１
であり、１命令は、配列の先頭、すなわち、外側ループ
における（Ｍ＋１）番目あるいは（Ｍ−１）番目の配列
において選好行列ポインタをリセットするためのもので
ある。Ｌ＞Ｍなら、ＬＳ_PM＝２０／ＬＬ＜Ｍなら、ＬＳ_PM＝１２／Ｌただし、内部ループをセットアップするための命令数は
Ｌ＞Ｍの場合２０、Ｌ＜Ｍの場合１２である。In the method of the present invention for an odd number M, IPP _PM = LOOP-IPP-O _PM + LS _PM where L >> 2M, LOOP-IPP-O _PM = (2+ (L
/ M) + (L * 9/2)) / L If L <2M, LOOP-IPP- _OPM = (2 + L +
(L * 9/2)) / L where the number of instructions in the inner loop for one pixel pair is 11
One instruction is for resetting the preference matrix pointer at the head of the array, that is, at the (M + 1) th or (M-1) th array in the outer loop. If L> M, LS _PM = 20 / L If L <M, LS _PM = 12 / L where the number of instructions for setting up the inner loop is 20 for L> M and 12 for L <M.

【００８３】偶数Ｍの場合、本発明方法では、ＩＰＰ_PM＝ＬＯＯＰ−ＩＰＰ−Ｅ_PM＋ＬＳ_PM ただし、Ｌ＞＞Ｍなら、ＬＯＯＰ−ＩＰＰ−Ｅ_PM＝（２＋（Ｌ／
Ｍ）＋（Ｌ＊９／２））／ＬＬ＜Ｍなら、ＬＯＯＰ−ＩＰＰ−Ｅ_PM＝（２＋Ｌ＋（Ｌ
＊９／２））／Ｌただし、１画素対に対する内側ループ内の命令数は１１
であり、１命令は、配列の先頭、すなわち、外側ループ
における（Ｍ＋１）番目あるいは（Ｍ−１）番目の配列
において選好行列ポインタをリセットするためのもので
ある。Ｌ＞Ｍなら、ＬＳ_PM＝２０／ＬＬ＜Ｍなら、ＬＳ_PM＝１２／Ｌただし、内部ループをセットアップするための命令数は
Ｌ＞Ｍの場合２０、Ｌ＜Ｍの場合１２である。注意すべ
きは、セットアップ命令総数は、悪い場合に設定される
要件に基づいている。ループ総数は、ゼロオーバーヘッ
ドループカウンタがデジタルイメージ・グラフィックプ
ロセッサ７１、７２、７３および７４のプログラムフロ
ー制御ユニット１３０によって支持されるものとして、
しかもデジタル信号プロセッサによっても広く支持され
るものとして想定されている。In the case of an even number M, in the method of the present invention, IPP _PM = LOOP-IPP-E _PM + LS _PM where L >> M, LOOP-IPP-E _PM = (2+ (L /
M) + (L * 9/2)) / L If L <M, LOOP-IPP-E _PM = (2 + L + (L
* 9/2)) / L where the number of instructions in the inner loop for one pixel pair is 11
One instruction is for resetting the preference matrix pointer at the head of the array, that is, at the (M + 1) th or (M-1) th array in the outer loop. If L> M, LS _PM = 20 / L If L <M, LS _PM = 12 / L where the number of instructions for setting up the inner loop is 20 for L> M and 12 for L <M. Note that the total number of setup instructions is based on the requirements set in the worst case. The total number of loops is determined assuming that a zero overhead loop counter is supported by the program flow control unit 130 of the digital image and graphics processors 71, 72, 73 and 74.
Moreover, it is assumed to be widely supported by digital signal processors.

【００８４】図１８は、従来技術方法に比較した、本発
明方法の走査線長に対する処理時間の百分率短縮のプロ
ットである。図１８において、処理時間は命令数に直接
関連があると考えられる。図１８は、様々な線長でＭが
８、９、８０および９０に等しい場合についての（１０
０−（ＩＰＰ_Pm＊１００）／ＩＰＰ_CM）％のプロットを
示している。L＿breakより長い線長の場合、提案方法は
現行の方法に比較して処理効率の革新的な増加を示す。
ブレーク長L＿breakは１６/（１−１／Ｍ）である。小
さな線長の場合は、提案方法のループセットアップとル
ープ外でのポインタリセットで発生する画素あたりのオ
ーバーヘッドが、従来技術の方法よりも走査線に対して
多くの命令を必要とする。線長が増加するにしたがっ
て、本発明方法のオーバーヘッドの相対的な寄与率は減
少する。損益分岐点はL＿breakにある。L＿breakを超え
ると、オーバーヘッドの相対的寄与率は大きく減少し、
最終的にはそれを無視できる飽和状態に到達する。提案
方法は、従来技術と比較して奇数Ｍの場合、（（Ｍ−
１）＊１００）／（２Ｍ）％つまり５０％弱だけチップ
上メモリ必要条件を減少させる。偶数Ｍに対しては、従
来技術と比較して１８０画素を超える長い線長の場合、
本提案方法は１５．８２％（偶数Ｍ＝８、９の場合）お
よび１８％（偶数Ｍ＝８０、９０の場合）の減少にな
る。奇数Ｍに対しては、従来技術と比較して１８０画素
を超える長い線長の場合、本提案方法は最大１６．１８
％（偶数Ｍ＝８、９の場合）および１８％（偶数Ｍ＝８
０、９０の場合）の減少になる。処理効率は線長を長さ
L＿breakより大きく増加させた場合に革新的に改善さ
れ、種々のＭ値に対して種々の最大改善値で飽和に達す
る。スクリーニングのための典型的な線長はL＿break
（１８画素）よりもずっと大きいので、本提案方法は著
しい利点を有する。FIG. 18 is a plot of the percentage reduction in processing time versus scan line length for the method of the present invention as compared to the prior art method. In FIG. 18, the processing time is considered to be directly related to the number of instructions. FIG. 18 shows (10) for M equal to 8, 9, 80 and 90 at various line lengths.
A plot of 0- (IPP _Pm * 100) / IPP _CM )% is shown. For line lengths longer than L_break, the proposed method shows an innovative increase in processing efficiency compared to current methods.
The break length L_break is 16 / (1-1 / M). For small line lengths, the overhead per pixel caused by the loop setup of the proposed method and the pointer reset outside the loop requires more instructions for scan lines than the prior art method. As the line length increases, the relative contribution of the overhead of the method of the invention decreases. The breakeven point is at L_break. Beyond L_break, the relative contribution of overhead decreases significantly,
Eventually it reaches a state of saturation that can be ignored. In the case of an odd number M as compared with the related art, the proposed method uses ((M−
1) Reduce on-chip memory requirements by * 100) / (2M)%, or less than 50%. For even M, for a long line length exceeding 180 pixels compared to the prior art,
The proposed method results in a reduction of 15.82% (for even M = 8, 9) and 18% (for even M = 80, 90). For odd M, for long line lengths exceeding 180 pixels compared to the prior art, the proposed method is up to 16.18.
% (Even M = 8, 9) and 18% (even M = 8)
0, 90). Processing efficiency is the length of the wire
It is innovatively improved when it is increased more than L_break and reaches saturation at various maximum improvement values for various M values. Typical line length for screening is L_break
Since it is much larger than (18 pixels), the proposed method has significant advantages.

【００８５】本提案方法の応用は、埋め込み型のラスタ
ーイメージ（ＲＩＰ）ソフトウェアの必須部分である実
時間多水準閾値スクリーニングにおける利用が典型的で
ある。チップ上メモリを圧迫していたマルチプロセッサ
集積回路１００上でのスクリーニングの実装は、実時間
実行を満足させるためにメモリ必要条件と処理時間の釣
り合いを取らねばならない。本提案方法は賢明にも、画
素あたり最小オーバーヘッドとなる処理を持つ処理ルー
プを用いることによって、チップ上資源を配分する。こ
うして本提案方法は、メモリの点でも処理時間の点で
も、実時間埋め込み型実行という制約の達成に寄与す
る。この概念は、入力ビット数と出力ビット数が異なる
場合にも簡単に拡張できる。The application of the proposed method is typically used in real-time multi-level threshold screening, which is an essential part of embedded raster image (RIP) software. Implementing screening on a multiprocessor integrated circuit 100 that has squeezed on-chip memory must balance memory requirements with processing time to satisfy real-time execution. The proposed method wisely allocates on-chip resources by using a processing loop with processing that has minimal overhead per pixel. Thus, the proposed method contributes to achieving the restriction of real-time embedded execution in terms of both memory and processing time. This concept can be easily extended even when the number of input bits and the number of output bits are different.

【００８６】以上の説明に関して更に以下の項を開示す
る。（１）より制限されたレンジの画像作成装置を用いて
グレースケール階調を近似するコンピュータ実装の方法
（computer implemented method）において、ページ記
述言語で表現された対象を画像作成装置の走査にレンダ
リングするステップと、レンダリングに基づいてレンダ
リング済み対象を有する画像領域を決定するステップ
と、前記レンダリング済み対象を有する画像領域内にお
いて前記走査の入力画素をより制限された範囲の画像画
素にスクリーニングするステップと、前記レンダリング
済み対象を有する画像領域外においては前記走査の入力
画素をスクリーニングしないステップとからなる前記方
法。（２）前記画像領域を決定するステップがレンダリン
グ済み対象をすべて包囲し該レンダリング済み対象を最
小に囲むバウンディングボックスを決定することからな
り、前記入力画素をスクリーニングするステップがどの
バウンディングボックス内の入力画素をもスクリーニン
グすることを含み、前記入力画素をスクリーニングしな
いステップがすべてのバウンディングボックス外にある
入力画素をスクリーニングしないことを含む、第１項記
載の方法。（３）前記画像領域を決定するステップがレンダリン
グ済み対象の一部を含む走査線を決定することからな
り、前記入力画素をスクリーニングするステップがレン
ダリング済み対象の一部を含むどの走査線上の入力画素
をもスクリーニングすることを含み、前記入力画素をス
クリーニングしないステップがレンダリング済み対象の
一部を全く含まないどの走査線上の入力画素をもスクリ
ーニングしないことを含む、第１項記載の方法。With respect to the above description, the following items are further disclosed. (1) In a computer-implemented method for approximating grayscale gradations using a more limited range imager, render an object represented in a page description language into a scan of the imager. Determining an image region having a rendered object based on the rendering; andscreening the scan input pixels to a more restricted range of image pixels within the image region having the rendered object, Not screening input pixels of the scan outside of the image area having the rendered object. (2) determining the image region comprises determining a bounding box surrounding all of the rendered objects and minimally surrounding the rendered objects, wherein the step of screening the input pixels comprises determining in which bounding box the input pixels The method of claim 1, further comprising screening, wherein the step of not screening input pixels comprises not screening input pixels that are outside all bounding boxes. (3) determining the image area comprises determining a scan line that includes a portion of the rendered object, and screening the input pixels includes determining the input pixel on any scan line that includes the portion of the rendered object; 2. The method of claim 1, further comprising screening, wherein the step of not screening the input pixels comprises not screening input pixels on any scan lines that do not include any portion of the rendered object.

【００８７】（４）通信チャネルで双方向通信をする
ことができるトランシーバと、メモリと、受信した画像
データと制御信号に応じて印刷ページ上にカラードット
を配置することができるプリントエンジンと、前記トラ
ンシーバと、前記メモリと、前記プリントエンジンとに
接続されたプログラマブルデータプロセッサとからなる
プリンタであって、前記プログラマブルデータプロセッ
サは、印刷すべきページに対する印刷データを前記通信
チャネルから前記トランシーバを介して受信し、対応ペ
ージを印刷するために前記印刷データを前記プリントエ
ンジンに供給するための画像データと制御信号とに変換
するようプログラムされており、前記グレースケール階
調を近似することを含み、該近似は、ページ記述言語で
表現された対象を印刷すべきページの走査にレンダリン
グすること、レンダリングに基づいてレンダリング済み
対象を有する画像領域を決定すること、前記レンダリン
グ済み対象を有する画像領域内において前記走査の入力
画素をより制限された範囲の画像画素にスクリーニング
すること、前記レンダリング済み対象を有する画像領域
外においては前記走査の入力画素をスクリーニングしな
いこと、および対応ページを印刷するために前記画像デ
ータと制御信号に応じて前記プリントエンジンを制御す
ること、によって行われるよう構成した、プリンタ。（５）レンダリング済み対象をすべて包囲し該レンダ
リング済み対象を最小に囲むバウンディングボックスを
決定することによって画像領域を決定し、どのバウンデ
ィングボックス内の入力画素をもスクリーニングし、す
べてのバウンディングボックス外にある入力画素をスク
リーニングしないように、前記データプロセッサがさら
にプログラムされている、第４項記載のプリンタ。（６）レンダリング済み対象の一部を含む走査線を決
定し、レンダリング済み対象の一部を含むどの走査線上
の入力画素をもスクリーニングし、レンダリング済み対
象の一部を含まないどの走査線上の入力画素をもスクリ
ーニングしないように、前記データプロセッサがさらに
プログラムされている、第４項記載のプリンタ。(4) A transceiver capable of performing bidirectional communication through a communication channel, a memory, a print engine capable of arranging color dots on a print page in accordance with received image data and control signals, and A printer comprising a transceiver, the memory, and a programmable data processor connected to the print engine, the programmable data processor receiving print data for a page to be printed from the communication channel via the transceiver. Converting the print data into image data and control signals for supplying the print engine to the corresponding page for printing the corresponding page, including approximating the gray scale gradation. Indicates the target expressed in the page description language Rendering a scan of a page to be printed, determining an image region having a rendered object based on the rendering, an image of a more restricted range of input pixels of the scan within the image region having the rendered object Controlling the print engine in response to the image data and control signals to print the corresponding page, screening the pixels, not screening the input pixels of the scan outside of the image area having the rendered object. That the printer is configured to do. (5) Determine the image area by surrounding all rendered objects and determining the bounding box that minimally surrounds the rendered object, screening input pixels in any bounding box and out of all bounding boxes The printer of claim 4 wherein said data processor is further programmed to not screen input pixels. (6) determining a scan line that includes a portion of the rendered object, screening input pixels on any scan line that includes the portion of the rendered object, and inputting on any scan line that does not include the portion of the rendered object; The printer of claim 4, wherein said data processor is further programmed to not screen any pixels.

【００８８】（７）より制限されたレンジの画像作成
装置を用いてグレースケール階調を選好行列を介して近
似するコンピュータ実装の方法において、前記選好行列
の各行を少なくとも２つの固定寸法のセグメントに分割
するステップと、前記選好行列の前記セグメントの１つ
に関連づけられたルックアップテーブルをキャッシュメ
モりにロードするステップと、前記選好行列の前記１つ
のセグメントに関連づけられた前記ルックアップテーブ
ルを介して入力画素をスクリーニングし、このとき選択
された走査線上の画素はすべて前記選好行列の前記１つ
のセグメント内に位置しているようにするステップと、
前記選好行列の次のセグメントに関連づけられたルック
アップテーブルをキャッシュメモりにロードするステッ
プと、前記選好行列の前記次のセグメントに関連づけら
れた前記ルックアップテーブルを介して入力画素をスク
リーニングし、このとき選択された走査線上の画素はす
べて前記選好行列の前記次のセグメント内に位置してい
るようにするステップと、前記選好行列のすべてのセグ
メント内に位置する前記選択された走査線の画素がスク
リーニングされるまで継続するステップとからなる方
法。(7) A computer-implemented method for approximating grayscale gradations via a preference matrix using a more limited range imager, wherein each row of the preference matrix is divided into at least two fixed-size segments. Splitting; loading a look-up table associated with one of the segments of the preference matrix into cache memory; and via the look-up table associated with the one segment of the preference matrix. Screening input pixels such that all pixels on the selected scan line are located within the one segment of the preference matrix;
Loading a look-up table associated with the next segment of the preference matrix into a cache memory; and screening input pixels via the look-up table associated with the next segment of the preference matrix; When all the pixels on the selected scan line are located in the next segment of the preference matrix, and wherein the pixels of the selected scan line located in all segments of the preference matrix are Continuing until screened.

【００８９】（８）通信チャネルで双方向通信をする
ことができるトランシーバと、メモリと、受信した画像
データと制御信号に応じて印刷ページ上にカラードット
を配置することができるプリントエンジンと、前記トラ
ンシーバと、前記メモリと、前記プリントエンジンとに
接続されたプログラマブルデータプロセッサとからなる
プリンタであって、前記プログラマブルデータプロセッ
サは、印刷すべきページに対応する印刷データを前記通
信チャネルから前記トランシーバを介して受信し、対応
ページを印刷するために前記印刷データを前記プリント
エンジンに供給するための画像データと制御信号とに変
換するようプログラムされており、前記変換は、前記選
好行列の各行を少なくとも２つの固定寸法のセグメント
に分割すること、前記選好行列の前記セグメントの一つ
に関連づけられたルックアップテーブルを前記キャッシ
ュメモりにロードすること、前記選好行列の前記１つの
セグメントに関連づけられた前記ルックアップテーブル
を介して入力画素をスクリーニングし、このとき選択さ
れた走査線上の画素はすべて前記選好行列の前記１つの
セグメント内に位置しているようにすること、前記選好
行列の次のセグメントに関連づけられたルックアップテ
ーブルを前記キャッシュメモりにロードすること、前記
選好行列の前記次のセグメントに関連づけられた前記ル
ックアップテーブルを介して入力画素をスクリーニング
し、このとき選択された走査線上の画素はすべて前記選
好行列の前記次のセグメント内に位置しているようにす
ること、前記選好行列のすべてのセグメント内に位置す
る前記選択された走査線の画素をスクリーニングするま
で継続すること、および対応ページを印刷するために前
記画像データと制御信号に応じて前記プリントエンジン
を制御することによって行われるよう構成した、プリン
タ。(8) A transceiver capable of performing bidirectional communication through a communication channel, a memory, a print engine capable of arranging color dots on a print page in accordance with received image data and control signals, and A printer comprising a transceiver, the memory, and a programmable data processor connected to the print engine, the programmable data processor transmitting print data corresponding to a page to be printed from the communication channel via the transceiver. Receiving the print data and printing the corresponding page, the print data is programmed to be converted into image data and control signals for providing to the print engine, wherein the conversion comprises converting each row of the preference matrix into at least two rows. Before splitting into two fixed dimension segments Loading a look-up table associated with one of the segments of a preference matrix into the cache memory; screening input pixels via the look-up table associated with the one segment of the preference matrix; Wherein all pixels on the selected scan line are located within the one segment of the preference matrix, and the look-up table associated with the next segment of the preference matrix is stored in the cache memory. Screening input pixels via the look-up table associated with the next segment of the preference matrix, wherein all pixels on the selected scan line are within the next segment of the preference matrix. , All segments of the preference matrix And a control unit for controlling the print engine in response to the image data and the control signal to print a corresponding page. Done, printer.

【００９０】（９）より制限されたレンジの画像作成
装置を用い、奇数行長を持つ選好行列を介してグレース
ケール階調を近似する一方、２つの出力画素を単一デー
タワードにパックするコンピュータ実装の方法におい
て、Ｍ−１個の入力画素とＭ＋１個の入力画素を交互に
考察し、それによってＭ−１個の入力画素とＭ＋１個の
入力画素の各組が偶数となるステップと、考察された入
力画素の各対に対して、対応する一対の出力画素を生成
するステップと、出力画素の各対を対応する１つの出力
データワードにパックするステップとを含む方法。(9) A computer that packs two output pixels into a single data word while approximating grayscale gradations using a preference matrix with an odd row length using a more limited range imager. In the method of implementation, alternately consider M-1 input pixels and M + 1 input pixels so that each set of M-1 and M + 1 input pixels is even. Generating a corresponding pair of output pixels for each paired input pixels, and packing each pair of output pixels into a corresponding one output data word.

【００９１】（１０）通信チャネルで双方向通信をす
ることができるトランシーバと、メモリと、受信した画
像データと制御信号に応じて印刷ページ上にカラードッ
トを配置することができるプリントエンジンと、前記ト
ランシーバと、前記メモリと、前記プリントエンジンと
に接続されたプログラマブルデータプロセッサとからな
るプリンタであって、前記プログラマブルデータプロセ
ッサは、印刷すべきページに対応する印刷データを前記
通信チャネルから前記トランシーバを介して受信し、対
応ページを印刷するために前記印刷データを前記プリン
トエンジンに供給するための画像データと制御信号に変
換するようプログラムされており、前記変換は、Ｍ−１
個の入力画素とＭ＋１個の入力画素を交互に考察し、そ
れによってＭ−１個の入力画素とＭ＋１個の入力画素の
各組が偶数となること、考察された入力画素の各対に対
して、対応する一対の出力画素を生成すること、出力画
素の各対を対応する１つの出力データワードにパックす
ることとによって、グレースケール階調をより制限され
たレンジの画像作成装置を用いて奇数行長を持つ選好行
列を介して近似する一方、２つの出力画素を単一データ
ワードにパックすること、および対応ページを印刷する
ために前記画像データと制御信号に応じて前記プリント
エンジンを制御することによって行われるよう構成し
た、プリンタ。(10) A transceiver capable of performing bidirectional communication through a communication channel, a memory, a print engine capable of arranging color dots on a print page in accordance with received image data and control signals, and A printer comprising a transceiver, the memory, and a programmable data processor connected to the print engine, the programmable data processor transmitting print data corresponding to a page to be printed from the communication channel via the transceiver. Receiving the print data and converting the print data into image data and control signals for supplying the print engine to the print engine in order to print the corresponding page.
And M + 1 input pixels are considered alternately, such that each set of M-1 and M + 1 input pixels is even, for each pair of input pixels considered Generating a corresponding pair of output pixels and packing each pair of output pixels into a corresponding one of the output data words using an image forming device having a more limited range of gray scale tones. Packing the two output pixels into a single data word while approximating via a preference matrix with odd row length, and controlling the print engine in response to the image data and control signals to print the corresponding page A printer configured to do so.

【００９２】（１１）本発明は、より制限されたレン
ジの画像作成装置、すなわち、スクリーニングとして公
知の処理を用いてグレースケール階調を近似することを
含む。本発明は、スクリーニングが必要とされない時を
識別することによってスクリーニングに必要な時間を短
縮する。第１実施例では、レンダリング処理（４０１）
がレンダリング済み対象のすべてを包囲する最小包囲バ
ウンディングボックス（４０３，４０４）を生成する。
代替実施例では、レンダリング済み対象の一部を含む走
査線に注目する（４１３）。このスクリーニングは、選
好行列の各行をセグメントに分割することによってメモ
リ使用をよりよくさせる。これらのセグメントに関連づ
けられたルックアップテーブルが順次メモりキャッシュ
にロードされる。ロードされたセグメントルックアップ
テーブル内に位置する入力画素がスクリーニングされ
る。その後、選好行列の次のセグメントに関連づけられ
たルックアップテーブルがメモリキャッシュにロードさ
れ、そのセグメント内に位置する入力画素をスクリーニ
ングするために使用される。本方法は、選好行列が奇数
行長を持つときでも、Ｍを行長として、Ｍ−１個の入力
画素とＭ＋１個の入力画素を交互に考察することによっ
て、多水準スクリーニングを行いながら２つの出力画素
を１つのデータワードにパックする。(11) The present invention includes approximating the gray scale gradation using a more limited range image forming apparatus, that is, a process known as screening. The present invention reduces the time required for screening by identifying when screening is not needed. In the first embodiment, the rendering process (401)
Generates a minimum surrounding bounding box (403, 404) that surrounds all of the rendered objects.
In an alternative embodiment, focus is on a scan line that includes a portion of the rendered object (413). This screening improves memory usage by dividing each row of the preference matrix into segments. The look-up tables associated with these segments are sequentially loaded into the memory cache. Input pixels located in the loaded segment lookup table are screened. Thereafter, the look-up table associated with the next segment of the preference matrix is loaded into the memory cache and used to screen for input pixels located within that segment. Even if the preference matrix has an odd row length, the method considers M-1 input pixels and M + 1 input pixels alternately with M as the row length, thereby performing two-level screening while performing multilevel screening. Pack the output pixels into one data word.

[Brief description of the drawings]

【図１】本発明を用いるような画像処理システムのシス
テムアーキテクチャを示す図。FIG. 1 is a diagram showing a system architecture of an image processing system using the present invention.

【図２】本発明の好適な実施例を形成する単一集積回路
マルチプロセッサのアーキテクチャを示す図。FIG. 2 illustrates the architecture of a single integrated circuit multiprocessor that forms a preferred embodiment of the present invention.

【図３】図２に示すデジタルイメージ・グラフィックス
プロセッサの１つを示すブロック図。FIG. 3 is a block diagram showing one of the digital image graphics processors shown in FIG. 2;

【図４】図２に示すデジタルイメージ・グラフィックス
プロセッサの演算のパイプライン段階を示す概略図。FIG. 4 is a schematic diagram illustrating pipeline stages of operations of the digital image graphics processor shown in FIG. 2;

【図５】本発明の好適な実施例におけるマスタープロセ
ッサのアーキテクチャを示す図。FIG. 5 is a diagram showing the architecture of a master processor in a preferred embodiment of the present invention.

【図６】マスタープロセッサの整数パイプライン演算を
示す図。FIG. 6 is a diagram showing an integer pipeline operation of the master processor.

【図７】マスタープロセッサの浮動小数点パイプライン
演算を示す図。FIG. 7 is a diagram showing a floating-point pipeline operation of the master processor.

【図８】ページ記述言語において明示された文書を印刷
するときに典型的に実行される各ステップを示す図。FIG. 8 is a diagram showing steps typically executed when printing a document specified in a page description language.

【図９】バウンディングボックス法の適用例を示す図。FIG. 9 is a diagram showing an application example of the bounding box method.

【図１０】スキャンライン法の適用例を示す図。FIG. 10 is a diagram showing an application example of a scan line method.

【図１１】従来技術スクリーニングにおいて典型的に使
用される３次元ルックアップテーブルの構造を示す図。FIG. 11 is a diagram showing the structure of a three-dimensional lookup table typically used in prior art screening.

【図１２】４ｘ４選好行列の一例を示す図。FIG. 12 is a diagram showing an example of a 4 × 4 preference matrix.

【図１３】奇数個の要素の行寸法を持つ選好行列の場合
に生じる従来技術の問題を示す図。FIG. 13 is a diagram illustrating a problem of the related art that occurs in the case of a preference matrix having a row size of an odd number of elements.

【図１４】本発明の方法の一様相を概略的に示す図。FIG. 14 schematically illustrates one aspect of the method of the present invention.

【図１５】奇数個の行寸法を持つ選好行列を処理する本
発明の方法を示す図。FIG. 15 illustrates a method of the present invention for processing a preference matrix having an odd number of row sizes.

【図１６】奇数Ｍを持つ選好行列を有するスクリーニン
グ用のルックアップテーブルに指標付けする従来方法を
概略的に示す図。FIG. 16 schematically illustrates a conventional method of indexing a lookup table for screening with a preference matrix having an odd number M.

【図１７】奇数Ｍを持つ選好行列を有するスクリーニン
グ用のルックアップテーブルに指標付けする本発明の方
法を概略的に示す図。FIG. 17 schematically illustrates a method of the invention for indexing a look-up table for screening with a preference matrix having an odd number M.

【図１８】従来方法に比較した本発明方法の線長に対す
る処理時間の減少率をプロットした図。FIG. 18 is a diagram plotting the reduction rate of the processing time with respect to the line length of the method of the present invention as compared with the conventional method.

[Explanation of symbols]

４０１レンダリングモジュール４０３，４０５バウンディングボックス４０７スクリーニングモジュール４０９ページバッファ４１１印刷４１３走査線配列 401 Rendering module 403, 405 Bounding box 407 Screening module 409 Page buffer 411 Printing 413 Scan line array

フロントページの続き (72)発明者エス．ラビイインド国カルナタカ，バンガローレ，ガンガナガルエクステンション，フォースクロス，セカンドメイン，70 (72)発明者ビベククマルサクルアメリカ合衆国テキサス，プラノ，フェアモント 5901 (72)発明者アール．スリニバサンインド国バンガローレ，ビジャヤナガル，マルシイレイアウト 69 Ｆターム(参考） 2C087 BA02 BA03 BA04 BA06 BA07 BA12 BC05 BD24 5B021 BB02 DD20 GG05 LG07 LG08 5B057 CA01 CA18 CB01 CB16 CE13 CE20 CH04 CH14 Continuation of front page (72) Inventor S. Rabiy India Karnataka, Bangalore, Ganganagar Extension, Force Cross, Second Maine, 70 (72) Inventor Vivek Kumar Sakul United States Texas, Plano, Fairmont 5901 (72) Inventor Earl. Srinivasan India Bangalore, Vijayanagal, Marushii Layout 69 F term (reference) 2C087 BA02 BA03 BA04 BA06 BA07 BA12 BC05 BD24 5B021 BB02 DD20 GG05 LG07 LG08 5B057 CA01 CA18 CB01 CB16 CE13 CE20 CH04 CH14

Claims

[Claims]

1. A computer-implemented method for approximating grayscale tones using a more limited range imager, rendering an object represented in a page description language into a scan of the imager. Determining an image region having a rendered object based on the rendering; screening input pixels of the scan for a more restricted range of image pixels within the image region having the rendered object; Not screening the input pixels of the scan outside of the image area having the object.

2. A transceiver capable of bidirectional communication over a communication channel; a memory; a print engine capable of arranging color dots on a printed page in response to received image data and control signals; And a programmable data processor connected to the memory and the print engine, the programmable data processor receiving print data for a page to be printed from the communication channel via the transceiver. Programming the print data into image data and control signals for supplying the print engine to the corresponding page for printing the corresponding page, including approximating the grayscale gradations, the approximation comprising: Prints objects expressed in page description language Rendering a scan of the page to be rendered; determining an image region having a rendered object based on the rendering; a more restricted range of image pixels within the image region having the rendered object. Screening the input pixels of the scan outside the image area having the rendered object, and controlling the print engine in response to the image data and the control signal to print a corresponding page. , Configured to be performed by a printer.