RU2421933C2

RU2421933C2 - System and method to generate and reproduce 3d video image

Info

Publication number: RU2421933C2
Application number: RU2009110511/09A
Authority: RU
Inventors: Артем Константинович ИГНАТОВ (RU); Артем Константинович Игнатов; Виктор Валентинович БУЧА (RU); Виктор Валентинович Буча; Михаил Николаевич Рычагов (RU); Михаил Николаевич Рычагов
Original assignee: Корпорация "САМСУНГ ЭЛЕКТРОНИКС Ко., Лтд."
Priority date: 2009-03-24
Filing date: 2009-03-24
Publication date: 2011-06-20
Also published as: RU2009110511A

Abstract

FIELD: information technologies.

SUBSTANCE: system to generate and reproduce a 3D video image, comprising the following: a complex of generation, a complex of disparity evaluation, a complex of images synthesis and a reproduction complex. Also a method to evaluate and specify disparity values for a stereo mate is proposed, consisting in making serial iterations of specifying the primary evaluation of disparity and providing for performance of the following operations: primary calculation of disparity is made; disparity specification filter parameters are tuned depending on the iteration number; disparity specification filter core size is evaluated depending on the iteration number; disparity map is confirmed in accordance with parameters and size of the filter core; convergence criteria are calculated and evaluated, at the same time, if they are complied with, then the process of filtering is terminated; post-treatment of disparity map is carried out, and the method to generate virtual images from a colour image and according disparity map of depth with the available camera parameters.

EFFECT: production of a high-quality image based on the accurate depth map formed from stereo mates.

42 cl, 19 dwg

Description

Изобретение относится к устройствам и способам обработки стереоизображений и видеопотоков и может найти применение, в частности, для получения в режиме реального времени трехмерной сцены от нескольких камер, для вычисления диспарантности между смежными видами и для формирования виртуальных видов для воспроизведения трехмерного изображения на авто-стереоскопических дисплеях.The invention relates to devices and methods for processing stereo images and video streams and can find application, in particular, for obtaining in real time a three-dimensional scene from several cameras, for calculating the disparity between adjacent views and for creating virtual views for reproducing a three-dimensional image on auto-stereoscopic displays .

Трехмерное телевидение берет свое начало с идеи восприятия глубины наблюдаемой сцены на телевизионном экране. Чтобы это реализовать, требуются несколько этапов обработки видеосигнала. Как правило, сюда включается этап формирования трехмерного образа, этап передачи изображения и этап воспроизведения.Three-dimensional television originates from the idea of perceiving the depth of the observed scene on a television screen. To realize this, several stages of video signal processing are required. As a rule, this includes the step of forming a three-dimensional image, the step of transmitting the image, and the playback step.

Трехмерный образ может быть сформирован на основе реальных данных или генерироваться на основе компьютерной графики. Для первого случая были разработаны весьма сложные способы оценки стереодиспарантности. При этом разработчики таких систем были вынуждены принимать во внимание необходимость компромиссных решений между качеством и рабочими характеристиками потому, что вычисление плотной карты диспарантности зачастую требует большого количества итераций алгоритма. Для второго случая данные о глубине; легко выявляются, поскольку геометрия сцены полностью определена и, сохраняется в буфере глубины (z-буфере).A three-dimensional image can be formed on the basis of real data or generated on the basis of computer graphics. For the first case, very sophisticated methods for assessing stereodispersity have been developed. At the same time, the developers of such systems were forced to take into account the need for compromise solutions between quality and performance because the calculation of a dense disparity map often requires a large number of iterations of the algorithm. For the second case, depth data; they are easily identified, since the geometry of the scene is fully defined and is stored in the depth buffer (z-buffer).

Из уроня техники известны спецификации различных трехмерных форматов. Причем наиболее популярными являются решения, основанные на передаче стереоэффекта посредством отдельных потоков для левого и правого глаза, а также изображения с по-пиксельной картой глубины. Первый подход удобен для захвата стереоскопического видеоизображения стереокамерой, в то время, как второй подход допускает формирование множества видов (ракурсов).From the loss of technology, the specifications of various three-dimensional formats are known. Moreover, the most popular are solutions based on the transmission of the stereo effect through separate streams for the left and right eyes, as well as images with a pixel-by-pixel depth map. The first approach is convenient for capturing stereoscopic video images with a stereo camera, while the second approach allows the formation of many views (angles).

Для трехмерного формата разработаны различные способы сжатия, в том числе с помощью традиционных кодеков типа MPEG-2, MPEG-4, H.264. Другие способы включают пространственное прогнозирование p- и b-кадров наряду с обычным темпоральным прогнозированием.Various compression methods have been developed for the three-dimensional format, including using traditional codecs such as MPEG-2, MPEG-4, H.264. Other methods include spatial prediction of p- and b-frames along with conventional temporal prediction.

После декодирования трехмерное изображение необходимо подготовить для визуализации на выбранном устройстве. Примером устройства визуализации трехмерного изображения служат автостереоскопические мониторы, которые используют линзово-растровый лист или оптические барьеры для пространственного разделения лучей. В любом случае, требуется специальная процедура, называемая смешиванием, чтобы подготовить смешанное изображение, которое составлено из множества виртуальных изображений, представляющих вид сцены с различных углов зрения (перспектив). С точки зрения конструкции трехмерный монитор отвечает за адекватное разделение световых лучей, соответствующих различным перспективам (углам зрения).After decoding, a three-dimensional image must be prepared for visualization on the selected device. An example of a three-dimensional image visualization device is autostereoscopic monitors, which use a lens-raster sheet or optical barriers for spatial separation of rays. In any case, a special procedure, called blending, is required to prepare a mixed image, which is composed of many virtual images representing a scene from various angles (perspectives). From a structural point of view, a three-dimensional monitor is responsible for the adequate separation of light rays corresponding to different perspectives (viewing angles).

Американские патентные заявки 20040032980 [1] и 20030007681 [2] раскрывают систему и способ, приспособленные к декодированию видео, которое сопровождается информацией о глубине сцены. Задача такой системы заключается в стереопреобразовании одиночного двумерного видеокадра. Это подразумевает формирование пары стереоизображений при наличии видеокадра с соответствующей картой глубины. Такие способы [1] и [2] определяют содержание карты глубины и возможное применение системы, адаптирванной к DIBR (depth image based rendering - представление изображения на основе данных о глубине). Недостаток этого подхода заключается в том, что эти системы предназначены для генерации только одной стереопары, а не для синтеза множества видов (перспектив). Кроме того, способ вычисления карты глубины не раскрывается. Иными словами, карту глубины для каждого видеокадра необходимо вычислять до генерации стереоизображений.US patent applications 20040032980 [1] and 20030007681 [2] disclose a system and method adapted to decode video, which is accompanied by information about the depth of the scene. The task of such a system is to stereo-convert a single two-dimensional video frame. This implies the formation of a pair of stereo images in the presence of a video frame with an appropriate depth map. Such methods [1] and [2] determine the content of the depth map and the possible use of a system adapted to DIBR (depth image based rendering - image representation based on depth data). The disadvantage of this approach is that these systems are designed to generate only one stereo pair, and not to synthesize many types (perspectives). In addition, the method of calculating the depth map is not disclosed. In other words, the depth map for each video frame must be calculated before generating stereo images.

Система трехмерного телевидения, описанная в американской патентной заявке 20050185711 [3], является самой близкой к заявляемому изобретению в том, что касается этапов формирования, передачи и воспроизведения стереоизображений. Эта патентная заявка выбрана в качестве прототипа заявляемого изобретения. Несмотря на то, что описанная система позволяет одновременный захват и кодировку нескольких стерео-видеопотоков, недостаток ее состоит в том, что эти видеопотоки должны быть особым образом де-мультиплексированы для конкретного устройства визуализации. В этом смысле система не является универсальной. Также визуализация виртуальных видов осуществляется усреднением входящих видеопотоков с помощью взвешенной суммы с постоянными коэффициентами, что порождает невысокое качество трехмерного изображения.The three-dimensional television system described in US patent application 20050185711 [3] is the closest to the claimed invention in terms of the steps of generating, transmitting and reproducing stereo images. This patent application is selected as a prototype of the claimed invention. Despite the fact that the described system allows the simultaneous capture and encoding of several stereo-video streams, its disadvantage is that these video streams must be specially de-multiplexed for a particular visualization device. In this sense, the system is not universal. Visualization of virtual views is also carried out by averaging the incoming video streams using a weighted sum with constant coefficients, which generates a low quality three-dimensional image.

Таким образом, задача, на решение которой направлено заявляемое изобретение, заключается в создании системы для формирования и воспроизведения трехмерного видеопотока, работающей с устройствами визуализации, известными из уровня техники, и обеспечивающей достаточно высокое качество трехмерного изображения, т.е. системы, свободной от недостатков, присущих прототипу [3].Thus, the problem to which the claimed invention is directed is to create a system for generating and reproducing a three-dimensional video stream, working with visualization devices known from the prior art, and providing a sufficiently high quality three-dimensional image, i.e. system free from the disadvantages inherent in the prototype [3].

Технический результат достигается за счет разработки системы и способа для формирования в режиме реального времени трехмерной сцены от нескольких камер, вычисления диспарантности между смежными видами и генерации виртуальных видов (изображений) для трехмерного воспроизведения на авто-стереоскопических дисплеях.The technical result is achieved by developing a system and method for generating in real time a three-dimensional scene from several cameras, calculating the disparity between adjacent views and generating virtual views (images) for three-dimensional playback on auto-stereoscopic displays.

Заявляемая система формирования и воспроизведения трехмерного видеоизображения состоит из:The inventive system for the formation and playback of three-dimensional video image consists of:

- комплекса формирования, включающего в себя:- complex formation, including:

- более одного устройства генерации видеопотоков;- more than one device for generating video streams;

- средства для хранения и передачи видеопотоков по сети к модулям извлечения стереокадров и оценки диспарантности;- Means for storing and transmitting video streams over the network to the modules for extracting stereo frames and evaluating disparity;

- модуль формирования, выполненный с возможностью извлечения стереокадров из множества видеопотоков;- a formation module configured to extract stereo frames from a plurality of video streams;

- модуль предварительной обработки стереокадров, выполненный с возможностью выделения множества проекций трехмерной сцены из стереокадров и выполнения процедур компенсации линзовых искажений и выравнивания изображений с использованием эпиполярного ограничения;- a module for pre-processing stereo frames, configured to extract multiple projections of a three-dimensional scene from stereo frames and perform procedures for compensating for lens distortion and image alignment using an epipolar constraint;

- комплекса оценки диспарантности, включающего в себя;- a complex assessment of disparity, including;

- цветовой преобразователь, выполненный с возможностью преобразовывать изображения в визуально однородное цветовое пространство:- a color converter, configured to convert images into a visually uniform color space:

- первый модуль диспарантности, выполненный с возможностью приблизительной оценки диспарантности на множестве проекций трехмерной сцены;- the first disparity module, made with the possibility of an approximate assessment of disparity on the set of projections of a three-dimensional scene;

- второй модуль диспарантности, выполненный с возможностью повторного уточнения приблизительной карты диспарантности;- the second disparity module, configured to re-refine the approximate disparity map;

- блок передачи, выполненный с возможностью упаковки карты диспарантности и проекций в трехмерные видеокадры и после факультативного сжатия передачи их в устройство трехмерного воспроизведения;- a transmission unit configured to pack a disparity map and projections into three-dimensional video frames and, after optionally compressing them, into a three-dimensional playback device;

- комплекса синтеза изображений, включающего в себя:- a complex of image synthesis, including:

- получающий блок, выполненный с возможностью получения и распаковки трехмерных видеокадров;- receiving unit, configured to receive and decompress three-dimensional video frames;

- блок синтеза, выполненный с возможностью формирования множества виртуальных видов наблюдаемой трехмерной сцены для воспроизведения на трехмерном дисплее;- a synthesis unit configured to generate a plurality of virtual views of the observed three-dimensional scene for playback on a three-dimensional display;

- комплекса воспроизведения, включающего в себя:- a reproduction complex, including:

- цифровой дисплей с оптической системой, выполненный с возможностью пространственного разделения множества виртуальных видов, таким образом, что каждым глазом наблюдается свой собственный вид воспроизводимой трехмерной сцены;- a digital display with an optical system, configured to spatially separate many virtual views, so that each eye observes its own kind of reproduced three-dimensional scene;

- мультиплексор, выполненный с возможностью подготовки мультиплексированного изображения в соответствии с характеристиками оптической системы трехмерного дисплея и множеством виртуальных видов.- a multiplexer, configured to prepare a multiplexed image in accordance with the characteristics of the optical system of a three-dimensional display and many virtual views.

Комплекс оценки и уточнения значений диспарантности для стереопары основан на выполнении последовательных итераций уточнения первичной оценки диспарантности. Такой способ предусматривает выполнение следующих операций:The complex of estimation and refinement of disparity values for a stereo pair is based on the execution of successive iterations of refinement of the initial disparity assessment. This method involves the following operations:

- выполняют первичное вычисление диспарантности;- perform the initial calculation of disparity;

- проводят настройку параметров фильтра уточнения диспарантности в зависимости от номера итерации;- spend adjusting the parameters of the refinement filter disparity depending on the iteration number;

- оценивают размер ядра фильтра уточнения диспарантности в зависимости от номера итерации;- evaluate the size of the core filter refinement disparity depending on the number of iteration;

- выполняют уточнение карты диспарантности согласно параметрам и размерам ядра фильтра;- refine the disparity map according to the parameters and dimensions of the filter core;

- вычисляют и оценивают критерии сходимости, при этом если они соблюдаются, то процесс фильтрования прекращают;- calculate and evaluate the criteria for convergence, while if they are followed, then the filtering process is stopped;

- выполняют пост-обработку карты диспарантности.- perform post-processing of the disparity card.

Реализуемый вышеописанной системой способ формирования виртуальных изображений из цветного изображения и соответствующей карты диспарантности/глубины при известных параметрах камеры заключается в выполнении следующих операций:The method of generating virtual images from a color image and the corresponding disparity / depth map with known camera parameters, implemented by the above system, consists in performing the following operations:

- формируют модели виртуальных камер;- form models of virtual cameras;

- определяют функцию формирования виртуальных изображений;- determine the function of forming virtual images;

- оценивают параметры функции формирования виртуальных изображений;- evaluate the parameters of the function of forming virtual images;

- формируют виртуальные изображения в соответствии с функцией формирования виртуальных изображений;- form virtual images in accordance with the function of forming virtual images;

- устраняют имеющиеся дисокклюзии на сформированных виртуальных изображениях.- eliminate existing dislocations on the generated virtual images.

Таким образом, основные отличительные признаки заявляемого изобретения от прототипа состоят в том, что предлагаемые способ и система обеспечивают получение высококачественного изображения на основе точной карты глубины, сформированной из стереокадров. При этом разработаны методики оценки диспарнтности и синтеза изображений, которые эффективно выполняются параллельными модулями вычисления данных (например, FPGA и GPU). Это дает возможность использования системы в режиме реального времени.Thus, the main distinguishing features of the claimed invention from the prototype are that the proposed method and system provide high-quality images based on an accurate depth map formed from stereo frames. At the same time, methods for evaluating the disparity and image synthesis have been developed, which are effectively performed by parallel data calculation modules (for example, FPGA and GPU). This makes it possible to use the system in real time.

Процедура получения и воспроизведения трехмерного видеоизображения включает в себя этап формирования стереоконтента, этап оценки диспарантности, этап синтеза виртуальных изображений этап воспроизведения трехмерного изображения. Этап формирования контента предусматривает использование нескольких видеокамер, обеспечивающих создание множества видеопотоков динамической сцены. Этап оценки диспарантности предусматривает использование вычислительных блоков для оценки диспарантности в реальном времени, используя полученные видеопотоки. Этап синтеза виртуальных изображений предусматривает использование вычислительных блоков для получения синтезированных виртуальных изображений в реальном времени на базе вычисленной диспарантности. Этап воспроизведения трехмерного изображения предусматривает использование вычислительных блоков и дисплея, способного отображать трехмерное изображение на основе нескольких виртуальных изображений, сгенерированных на предыдущем этапе.The procedure for obtaining and reproducing a three-dimensional video image includes the stage of forming stereo content, the stage of evaluating disparity, the stage of synthesis of virtual images, the stage of reproducing a three-dimensional image. The stage of content formation involves the use of several cameras, providing the creation of multiple video streams of a dynamic scene. The disparity assessment phase involves the use of computing units for real-time disparity assessment, using the resulting video streams. The stage of synthesis of virtual images involves the use of computing units to obtain synthesized virtual images in real time based on the calculated disparity. The step of reproducing a three-dimensional image involves the use of computing units and a display capable of displaying a three-dimensional image based on several virtual images generated in the previous step.

Далее существо заявляемого технического решения поясняется в деталях с привлечением графических материалов.Further, the essence of the claimed technical solution is explained in detail with the involvement of graphic materials.

Фиг.1. Принципиальная схема заявляемой системы с описанием основных этапов.Figure 1. Schematic diagram of the inventive system with a description of the main stages.

Фиг.2. Детализированная блок-схема заявляемой системы.Figure 2. Detailed block diagram of the inventive system.

Фиг.3. Структура вычислительной системы, используемой для оценки диспарантности и синтеза изображения.Figure 3. The structure of the computing system used to evaluate disparity and image synthesis.

Фиг.4. Детализированная блок-схема вычислительной системы с параллельными каналами, используемой для оценки и уточнения диспарантности, согласно заявляемому изобретению.Figure 4. A detailed block diagram of a computing system with parallel channels used to evaluate and refine disparity, according to the claimed invention.

Фиг.5. Алгоритм работы модуля предварительной обработки стереокадров.Figure 5. The algorithm of the module for pre-processing stereo frames.

Фиг.6. Примеры различной ориентации стереокадров.6. Examples of different orientations of stereo frames.

Фиг.7. Способ уточнения карты диспарантности.7. A way to clarify the disparity map.

Фиг.8. Принцип фильтрования карты диспарантности на основе сравнения участков пикселей: 8.1 - сравнение двумерных блоков, 8.2 - сравнение одномерных горизонтальных блоков, 8.3 - сравнение одномерных вертикальных блоков.Fig. 8. The principle of filtering the disparity map based on the comparison of pixel sections: 8.1 - comparison of two-dimensional blocks, 8.2 - comparison of one-dimensional horizontal blocks, 8.3 - comparison of one-dimensional vertical blocks.

Фиг.9. Оценка карты диспарантности и результаты уточнения карты.Fig.9. Evaluation of the disparity card and the results of the refinement of the card.

Фиг.10. Структура вычислительной системы, используемой для получения множества виртуальных изображений.Figure 10. The structure of the computing system used to obtain many virtual images.

Фиг.11. Способ получения виртуальных изображений.11. A method of obtaining virtual images.

Фиг.12. Пример размещения виртуальных видеокамер.Fig. 12. An example of placing virtual cameras.

Фиг.13. Моделирование виртуальной видеокамеры: 13.1 - виртуальное изображение, усеченное параллельными плоскостями, 13.2 - плоскость виртуального изображения.Fig.13. Modeling a virtual video camera: 13.1 - a virtual image truncated by parallel planes, 13.2 - a plane of a virtual image.

Фиг.14. Иллюстрация получения виртуального изображения с устранением дисокклюзий в новых изображениях.Fig.14. Illustration of obtaining a virtual image with the elimination of disruption in new images.

Заявляемая система состоит из четырех основных комплексов, а именно:The inventive system consists of four main complexes, namely:

комплекса 101 формирования стереоконтента, комплекса 102 оценки диспарантности, комплекса 103 синтеза изображений и комплекса 104 воспроизведения.a stereo content generating complex 101, a disparity assessment complex 102, an image synthesis complex 103, and a reproducing complex 104.

Комплекс 101 формирования стереоконтента включает в себя набор (множество) видеокамер 106, обеспечивающих генерацию множества видеопотоков динамической сцены 105. Число видеокамер в системе должно составлять не менее двух, что позволяет формировать видеокадры сцены, снятые с разных углов зрения, то есть создавать стереоэффект. В минимально допустимой комплектации система снабжается стереокамерой. Для хранения и передачи (пересылки) видеопотоков в комплекс 102 с целью последующей обработки предлагается использовать работающую в режиме реального времени устройство захвата видео 107, накопитель (память) 108 или сетевое устройство 100.The stereo content generation complex 101 includes a set of (many) video cameras 106 that provide the generation of multiple video streams of the dynamic scene 105. The number of video cameras in the system must be at least two, which allows you to create video frames of the scene taken from different angles of view, that is, create a stereo effect. In the minimum acceptable configuration, the system is equipped with a stereo camera. It is proposed to use a real-time video capture device 107, a drive (memory) 108 or a network device 100 for storing and transmitting (forwarding) video streams to a complex 102 for the purpose of subsequent processing.

Полученные от комплекса 101 видеопотоки поступают через промежуточные устройства 107, 108 или 100 на вход комплекса 102, где из этих потоков формируется стереокадр 109, составленный, как правило, из нескольких изображений, полученных от конкретной видеокамеры. Комплекс 102 включает в себя также вычислительное устройство 1010, которое использует алгоритм 1011 оценки диспарантности в режиме реального времени на основе стереокадра 109. Для решения проблем, связанных с высокой вычислительной сложностью способа оценки карты диспарантности, целесообразно использовать дополнительные вычислительные модули, действующие совместно с вычислительным устройством 1010.The video streams received from the complex 101 are transmitted through intermediate devices 107, 108, or 100 to the input of the complex 102, where a stereo frame 109 is formed from these streams, which is usually composed of several images obtained from a particular video camera. The complex 102 also includes a computing device 1010, which uses the real-time disparity estimation algorithm 1011 based on the stereo frame 109. To solve the problems associated with the high computational complexity of the disparity map estimation method, it is advisable to use additional computing modules that work in conjunction with the computing device 1010

Комплекс 103 синтеза виртуальных изображений включает в себя вычислительный модуль 1012 (модуль синтеза представления), использующий алгоритм (способ) синтеза виртуальных изображений в режиме реального времени на основе стереокадра 109 и карты диспарантности, вычисленной с помощью алгоритма 1011. Комплекс 103 формирует определенное число виртуальных изображений 1013 в соответствии с параметрами виртуальных видеокамер.The virtual image synthesis complex 103 includes a computing module 1012 (a presentation synthesis module) using a real-time virtual image synthesis algorithm (method) based on a stereo frame 109 and a disparity map computed using algorithm 1011. The complex 103 generates a certain number of virtual images 1013 in accordance with the parameters of virtual video cameras.

Комплекс 104 трехмерного воспроизведения также включает в себя вычислительный модуль 1014, предназначенный для мультиплексирования изображений, и дисплей 1015 для трехмерного воспроизведения виртуальных изображений 1013, сформированых в комплексе 103. Трехмерный дисплей 1015 включает в себя ЖК-панель 1016, на которой воспроизводится мультиплексированное изображение. Оптическое устройство 1017 (линзово-растровый лист, параллакс-барьер и т.д.) отделяет мультиплексированное изображение в пространстве таким образом, что каждый глаз человека 1018 наблюдает соответствующее виртуальное изображение 1013 для нормального восприятия трехмерной сцены.The three-dimensional playback complex 104 also includes a computing module 1014 for multiplexing images, and a display 1015 for three-dimensional reproduction of virtual images 1013 formed in the complex 103. The three-dimensional display 1015 includes an LCD panel 1016 on which the multiplexed image is displayed. An optical device 1017 (lens-raster sheet, parallax barrier, etc.) separates the multiplexed image in space so that each human eye 1018 observes a corresponding virtual image 1013 for normal perception of a three-dimensional scene.

Фиг.2 показывает детализированную блок-схему системы, согласно заявляемому изобретению. Комплекс 201 формирования стереоконтента включает в себя модуль 208 видеозахвата, который извлекает стереокадры 209 из устройств, отвечающих за генерацию стереоконтента. Обычно, стереокадры 209 могут быть получены от набора 205 видеокамер. Кроме того, стереокадры 209 могут быть записаны и сохранены на устройстве памяти в формате видеофайлов 207. Модуль 208 видеозахвата, в случае записанного стереоконтента, распаковывает и анализирует видеофайл для извлечения стереокадра 209. Еще одним поставщиком контента является сетевое телевидение - IPTV 206, обеспечивающее передачу видеопотока через сеть. В этом случае, предназначение модуля 208 видеозахвата состоит в том, чтобы подсоединиться к хост-компьютеру и обеспечить получение видеопотока и извлечение стереокадра 209.Figure 2 shows a detailed block diagram of a system according to the claimed invention. The stereo content generation complex 201 includes a video capture module 208 that extracts stereo frames 209 from devices responsible for generating stereo content. Typically, stereo frames 209 can be obtained from a set of 205 cameras. In addition, stereo frames 209 can be recorded and saved on a memory device in the format of video files 207. Video capture module 208, in the case of recorded stereo content, decompresses and analyzes the video file to extract the stereo frame 209. Another content provider is network television - IPTV 206, which provides video stream transmission through the network. In this case, the purpose of the video capture module 208 is to connect to the host computer and provide a video stream and extract the stereo frame 209.

Полученный стереокадр 209 обычно состоит из двух или болеее снимков, полученных конкретной видеокамерой. Такие снимки могут группироваться в нисходящей форме 602 или форме 601 слева-направо (см. Фиг.6). Для дальнейшей обработки такие снимки следует разрезать и сформировать отдельные изображения, соответствующие каждому снимку. Эта процедура выполняется в модуле 210 предварительной обработки стереокадра. Кроме того, модуль 210 выполняет корректировку снимков и компенсацию линзовых искажений объектива. Получаемые на выходе комплекса 201 улучшенные изображения поступают затем на вход комплекса 202 оценки диспарантности.The resulting stereo frame 209 typically consists of two or more pictures taken by a particular video camera. Such pictures can be grouped in descending form 602 or form 601 from left to right (see Figure 6). For further processing, such images should be cut and individual images formed corresponding to each image. This procedure is performed in the stereo frame pre-processing module 210. In addition, module 210 performs image correction and lens distortion compensation. The improved images obtained at the output of the complex 201 then go to the input of the disparity assessment complex 202.

Блок 211 цветового преобразования преобразует улучшенные изображения в перцепционно однородное цветовое пространство, такое как Lab, для более точного сравнения пикселей в модулях 212 и 213 оценки диспарантности. Выход блока 211 цветового преобразования подключен к модулю 212 начальной оценки диспарантности. Начальная оценка диспарантности осуществляется с помощью известных из уровня техники традиционных алгоритмов совмещения стереоизображений, а также на основе изображений с хаотическим шумом. Изображение с хаотическим шумом определяется как полутоновое изображение. Интенсивности пикселей в изображениях с хаотическим шумом произвольно выбираются из диапазона, включающего пиксели от минимальной интенсивности до максимальной интенсивности. Для восьми-битовых изображений диапазон интенсивностей составляет от 0 до 255.The color conversion unit 211 converts the enhanced images into a perceptually uniform color space, such as Lab, for more accurate pixel comparison in the disparity estimation units 212 and 213. The output of the color conversion unit 211 is connected to the initial disparity estimation module 212. The initial assessment of disparity is carried out using traditional stereo image combining algorithms known from the prior art, as well as based on images with chaotic noise. An image with chaotic noise is defined as a grayscale image. The intensities of pixels in images with chaotic noise are arbitrarily selected from a range including pixels from the minimum intensity to the maximum intensity. For eight-bit images, the intensity range is from 0 to 255.

Модуль 212 генерирует приблизительные карты диспарантности, которые затем уточняются в модуле 213 уточнения диспарантности согласно заявляемому изобретению. Модуль 212 начальной оценки диспарантности и модуль 213 уточнения диспарантности проводят стереосовмещение и уточнение с помощью вычислительного устройства с многочисленными блоками параллельного вычисления.Module 212 generates approximate disparity maps, which are then refined in disparity refinement module 213 according to the claimed invention. The initial disparity assessment module 212 and the disparity refinement module 213 perform stereo alignment and refinement using a computing device with multiple parallel computing units.

Уточненная высококачественная карта диспарантности 214 является продуктом модуля 213 уточнения диспарантности. Следует учитывать, что на этапе 202 оценки диспарантности могут быть сгенерированы несколько карт диспарантности для каждого изображения, извлеченного из стереокадра 209. Несколько карт диспарантности, совмещенные с соответствующими несколькими изображениями, обеспечивают более точный синтез виртуального изображения. Однако для простоты иллюстрируем случай с одной картой диспарантности.The refined high-quality disparity map 214 is a product of the disparity refinement module 213. It will be appreciated that, at a disparity estimation step 202, several disparity cards may be generated for each image extracted from the stereo frame 209. Several disparity cards combined with the corresponding multiple images provide a more accurate synthesis of the virtual image. However, for simplicity, we illustrate the case with one disparity card.

Полученная карта 214 диспарантности вместе с соответствующим изображением из трехмерного видеокадра может быть записана в запоминающее устройство в качестве видеофайла для последующей передачи в устройство трехмерного воспроизведения через блок 215 передачи. Ясно, что передача может быть осуществлена и без факультативного сохранения в запоминающем устройстве. Блок 215 передачи компонует карту (карты) диспарантности и изображение (изображения) в трехмерные видеокадры и после факультативной компрессии передает их на устройство трехмерного воспроизведения. Такая передача может осуществляться путем соединения блока 215 передачи с модулем-приемником 217 в устройстве трехмерного воспроизведения. Такое соединение может осуществляться как по проводам, так и по радиоканалу.The resulting disparity card 214, together with the corresponding image from the three-dimensional video frame, can be recorded in the storage device as a video file for subsequent transmission to the three-dimensional playback device through the transfer unit 215. It is clear that the transfer can be carried out without optional storage in the storage device. The transmission unit 215 composes the disparity card (s) and the image (s) in three-dimensional video frames and, after optional compression, transmits them to the three-dimensional playback device. Such a transmission can be accomplished by connecting the transmission unit 215 to the receiver module 217 in a three-dimensional playback device. Such a connection can be carried out both by wire and by radio channel.

В комплексе 203 синтеза изображения из изображений и карты диспарантности, извлеченных из трехмерного видеокадра, получают с помощью модуля 217 определенное число виртуальных изображений, соответствующее числу виртуальных камер. Следует учитывать, что число виртуальных изображений может быть больше, чем число изображений, полученных в трехмерный видеокадр. Число виртуальных изображений, которые будут генерироваться, зависит от количества видов, которые может воспроизвести трехмерный дисплей. Когда количество видов и параметры виртуальых камер опрделены, модуль 217 синтеза изображений генерирует необходимое число виртуальных изображений 218 в соответствии с заявляемым способом.In the complex 203 for synthesizing images from images and disparity cards extracted from a three-dimensional video frame, a certain number of virtual images corresponding to the number of virtual cameras is obtained using module 217. It should be noted that the number of virtual images may be greater than the number of images obtained in a three-dimensional video frame. The number of virtual images that will be generated depends on the number of views that a three-dimensional display can reproduce. When the number of views and parameters of the virtual cameras are determined, the image synthesis module 217 generates the required number of virtual images 218 in accordance with the claimed method.

Комплект виртуальных изображений 218 подают на вход комплекса 204 трехмерного воспроизведения, который включает трехмерный дисплей 221 и блок 219 мультиплексирования изображения. Трехмерный дисплей 221 представляет собой обычный цифровой дисплей с оптической системой, такой как лентикулярный лист, параллакс-барьер и т.д. Такая оптическая система позволяет видеть только определенные пиксели в зависимости от положения наблюдателя. Таким образом, мультиплексор 219 изображений генерирует мультиплексированное изображение 220 в соответствии с оптической системой и множеством виртуальных изображений 218. Тогда мультиплексированное изображение 220 выводят на трехмерный дисплей 221, и оптическая система осуществляет демультиплексирование оптическими средствами.A set of virtual images 218 is fed to the input of a three-dimensional playback complex 204, which includes a three-dimensional display 221 and an image multiplexing unit 219. The three-dimensional display 221 is a conventional digital display with an optical system, such as a lenticular sheet, parallax barrier, etc. Such an optical system allows you to see only certain pixels, depending on the position of the observer. Thus, the image multiplexer 219 generates a multiplexed image 220 in accordance with the optical system and the plurality of virtual images 218. Then, the multiplexed image 220 is displayed on a three-dimensional display 221, and the optical system performs demultiplexing by optical means.

Фиг.3 показывает структуру вычислительного устройства, используемого для оценки диспарантности и синтеза изображения. Такое устройство включает в себя блок 301 памяти данных, который используется для хранения изображений, карт диспарантности, видов, параметров камеры и промежуточных данных, необходимых для вычислений. Устройство 303 ввода/вывода выполняет роль коммуникатора с внешними устройствами, такими как драйвер памяти данных, сеть, камеры и трехмерный дисплей. Устройство 303 ввода/вывода передает и получает данные от блока 301 памяти данных.Figure 3 shows the structure of a computing device used to evaluate disparity and image synthesis. Such a device includes a data memory unit 301, which is used to store images, disparity cards, views, camera parameters, and intermediate data needed for calculations. The input / output device 303 acts as a communicator with external devices such as a data memory driver, a network, cameras, and a three-dimensional display. An input / output device 303 transmits and receives data from a data memory unit 301.

Вычислительный блок 304 состоит из вычислительных модулей с массовым параллелизмом, каждый из которых может работать с различными данными параллельно в соответствии с инструкциями, записанными в программах в блоке 302 памяти программ. Блок 302 памяти программ хранит инструкции в соответствии с заявляемым способом для оценки диспарантности и синтеза изображений. Блок 302 памяти программ, вычислительный блок и устройство ввода/вывода соединены между собой общей шиной 305 данных.Computing unit 304 consists of computing modules with mass parallelism, each of which can work with different data in parallel in accordance with the instructions recorded in the programs in block 302 of the program memory. Block 302 program memory stores instructions in accordance with the claimed method for evaluating disparity and image synthesis. The program memory unit 302, the computing unit, and the input / output device are interconnected by a common data bus 305.

Фиг.4 показывает детализированную блок-схему архитектуры параллельного вычисления, используемой для оценки диспарантности и уточнения карты диспарантности согласно варианту реализации заявляемого изобретения. Входные данные для оценки диспарантности состоят из уточненных изображений и карты 403 начальной диспарантности. Необходимы, по меньшей мере, два уточненных изображения для оценки диспарантности. Одно изображение является опорным изображением 401, а другое изображение является совмещаемым изображением 402. Карта 403 начальной диспарантности хранит значения диспарантности, уточненные на предыдущих этапах. Такая карта 403 начальной диспарантности уточняется в соответствии с заявляемым изобретением, и в результате получают уточненную карту 410 диспарантности.Figure 4 shows a detailed block diagram of the parallel computing architecture used to evaluate the disparity and refine the disparity map according to an embodiment of the claimed invention. The input for the disparity assessment consists of refined images and an initial disparity map 403. At least two refined images are needed for disparity assessment. One image is a reference image 401, and the other image is a compatible image 402. The initial disparity map 403 stores the disparity values specified in the previous steps. Such an initial disparity card 403 is specified in accordance with the claimed invention, and as a result, an updated disparity card 410 is obtained.

Способ оценки диспарантности и уточнения согласно заявляемому изобретению разработан таким образом, что процесс уточнения карты диспарантности для одной строки может осуществляться независимо от процесса уточнения других строк. Такое важное свойство может быть использовано вычислительным устройством 406 с большим числом модулей, осуществляющих параллельные вычисления, для значительного ускорения процесса оценки диспарантности. Вычислительное устройство 406 работает в соответствии с программой 412 для оценки диспарантности и ее уточнения, которая записана в блоке 411 памяти программ. Программа 412 соответствует способу оценки диспарантности и ее уточнения, который детально описан ниже.The method of assessing disparity and refinement according to the claimed invention is designed in such a way that the process of updating the disparity map for one line can be carried out independently of the process of clarifying other lines. This important property can be used by computing device 406 with a large number of modules performing parallel calculations to significantly speed up the process of evaluating disparity. Computing device 406 operates in accordance with program 412 for evaluating disparity and refining it, which is recorded in block 411 of program memory. Program 412 corresponds to a method for assessing disparity and refining it, which is described in detail below.

Модуль 404 расщепления изображения на строки делит входящие изображения 401, 402, 403 на множество строк, которые могут обрабатываться независимо, и хранит множества строк в буфере 405 памяти данных. Следует понимать, что множество строк может включать в себя либо горизонтальные строки изображения, либо вертикальные столбцы изображения. Количество строк зависит от размера изображения и равно ширине или высоте обрабатываемого изображения.Module 404 splitting the image into lines divides the incoming image 401, 402, 403 into many lines that can be processed independently, and stores many lines in the buffer 405 of the data memory. It should be understood that many rows can include either horizontal rows of an image or vertical columns of an image. The number of lines depends on the size of the image and is equal to the width or height of the processed image.

Вычислительное устройство 406 включает в себя несколько вычислительных модулей, каждый из которых обрабатывает назначенный ему набор из множества строк параллельно с другими модулями и генерирует соответствующую строку с уточненной диспарантностью. Строки с уточненной диспарантностью сохраняются в буфере 408 памяти данных. Модуль 409 совмещения строк генерирует уточненную карту 410 диспарантности из строк с уточненной диспарантностью, сохраняемых в буфере 408 памяти данных.Computing device 406 includes several computing modules, each of which processes a set of multiple lines assigned to it in parallel with other modules and generates a corresponding line with specified disparity. Lines with specified disparity are stored in the data memory buffer 408. Row alignment module 409 generates a refined disparity map 410 from strings with refined disparity stored in a data memory buffer 408.

Следует учитывать, что число вычислительных модулей может быть не равно числу наборов данных. Чтобы решить эту проблему модуль 407 планирования вычислений согласует число строк, сгенерированных модулем 404 расщепления изображения на строки, с числом вычислительных модулей, которые незагружены и могут быть использованы для проведения вычислений. После того, как вычислительный модуль начинает обработку данных, он помечается как "занятый". После формирования строки с уточненной диспарантностью вычислительный модуль помечают как "свободный". Модуль 407 планирования вычислений управляет также модулем 409 совмещения строк, посылая управляющий сигнал, когда новые строки с уточненной диспарантностью становятся доступными.It should be noted that the number of computing modules may not be equal to the number of data sets. To solve this problem, the calculation planning module 407 matches the number of lines generated by the image splitting module 404 with the number of computing modules that are unloaded and can be used for calculations. After the computing module starts processing the data, it is marked as "busy". After the formation of the line with the specified disparity, the computing module is marked as "free". The calculation planning module 407 also controls the row alignment module 409, sending a control signal when new rows with refined disparity become available.

На Фиг.5 приведена последовательность операций модуля 210 предварительной обработки стереокадра 501. Стереокадр 501 может иметь ориентацию «лево-право», как показано на Фиг.6, вид 601 или «верх-низ», как показано на виде 602. Модуль 502 ориентации стереокадра выполняет автоматическое определение типа ориентации по данным о ссотношении размера сторон кадра. Тогда, зная ориентацию стереокадра и число изображений, определяют координаты изображений 504, 505 в модуле 503 разделения стереокадра. Модуль 506 компенсации искажений объектива и модуль 507 уточнения выполняют стандартные процедуры, известные из уровня техники. Подобным образом формируются опорное изображение 509 и совмещаемое изображение 508, используемые для оценки диспарантности на последующих этапах.Figure 5 shows the sequence of operations of the module 210 preprocessing stereo frame 501. The stereo frame 501 may have a left-right orientation, as shown in Fig.6, view 601 or "top-bottom", as shown in view 602. Orientation module 502 the stereo frame automatically determines the type of orientation based on the data on the aspect ratio of the frame sides. Then, knowing the orientation of the stereo frame and the number of images, the coordinates of the images 504, 505 in the stereo frame separation unit 503 are determined. The lens distortion compensation module 506 and the refinement module 507 perform standard procedures known in the art. In this manner, a reference image 509 and a compatible image 508 are used, which are used to evaluate the disparity in subsequent steps.

Рассмотрим поэтапный процесс реализации способа оценки карты диспарантности и ее уточнения на паре стереоизображений (Фиг.7). Первый этап способа уточнения карты диспарантности состоит в адаптации силы фильтрации согласно номеру итерации (Этап 701). В предпочтительном варианте реализации заявляемого изобретения уравнение для адаптации силы фильтра в соответствии с номером итерации (функция σ(k)) имеет линейный вид и представляется как:Consider the step-by-step process of implementing the method for evaluating the disparity card and its refinement on a pair of stereo images (Fig. 7). The first step in the method for refining the disparity map is to adapt the filtering strength according to the iteration number (Step 701). In a preferred embodiment of the invention, the equation for adapting the filter strength in accordance with the iteration number (function σ (k)) has a linear form and is represented as:

σ(k)=a₁·k·b₁,σ (k) = a ₁ · k · b _1,

где k - номер итерации,where k is the iteration number,

α₁, b₁ - линейные коэффициенты.α ₁ , b ₁ - linear coefficients.

Допустима и иная форма функции σ(k), что не отражается на объеме охраны заявляемого изобретения.Another form of the function σ (k) is also acceptable, which does not affect the scope of protection of the claimed invention.

Следующим этапом реализации способа является проведение оценки размера ядра фильтра (Этап 702). В предпочтительном варианте реализации заявляемого изобретения уравнение для оценки размера ядра фильтра в соответствии с номером (функция KS(k)) имеет линейный вид и выражается как:The next step in the implementation of the method is to evaluate the size of the filter core (Step 702). In a preferred embodiment of the invention, the equation for estimating the size of the filter core in accordance with the number (function KS (k)) has a linear form and is expressed as:

KS(k)=a₂·k+b₂,KS (k) = a ₂ · k + b _2,

где k - номер итерации,where k is the iteration number,

a₂, b₂ - линейные коэффициенты.a ₂ , b ₂ - linear coefficients.

Допустима также иная форма функции KS(k), что не отражается на объеме охраны заявляемого изобретения.A different form of the function KS (k) is also acceptable, which does not affect the scope of protection of the claimed invention.

После вычисления силы фильтра и оценки размера ядра фильтра выполняют уточнение карты диспарантности для опорного изображения, используя информацию от совмещаемого изображения (Этап 703-Этап 704). В дальнейшем опорное изображение определяется как цветное изображение от стереопары, для которой проведена оценка диспарантности. А совмещаемое изображение определяется как другое цветное изображение стереопары.After calculating the filter strength and estimating the size of the filter core, the disparity map is refined for the reference image using information from the combined image (Step 703-Step 704). Hereinafter, the reference image is defined as a color image from a stereo pair for which a disparity assessment has been carried out. And the combined image is defined as another color image of a stereo pair.

Карта диспарантности на k-ой итерации представляется какThe disparity map at the kth iteration is represented as

где d_k(x_c, y_c) обозначают карту диспарантности на k-ой итерации для текущего пикселя с координатами (x_c, y_c),where d _k (x _c , y _c ) denote the disparity map at the kth iteration for the current pixel with coordinates (x _c , y _c ),

d_k-1(x_r, y_r) обозначают карту диспарантности на (k-1)-ой итерация для опорного пикселя с координатами (x_r=x_c+p, y_r=y_c+s),d _k-1 (x _r , y _r ) denote the disparity map at the (k-1) th iteration for the reference pixel with coordinates (x _r = x _c + p, y _r = y _c + s),

w_γ обозначает вес опорного пикселя,w _γ denotes the weight of the reference pixel,

индекс p изменяется от

до

в направлении X,index p varies from

before

in the direction of X,

индекс s изменяется от

до

в направлении Y,index s varies from

before

in the direction of Y

нормирующий множитель вычисляют как

.the normalization factor is calculated as

.

В предпочтительном варианте реализации заявляемого изобретения размеры окна фильтра, а именно L и K, определены как L=K=KS(k), используя функцию оценки размера ядра фильтра, вычисленного на этапе 702. Однако независимое задание размеров L и K тоже допустимо. Это могло бы стать необходимым для устройств, которые используют для обработки построчную память. В таких случаях радиус ядра фильтра в вертикальном направлении ограничен числом строк, а радиус ядра фильтра в горизонтальном направлении может быть установлен произвольно на желаемое значение.In a preferred embodiment of the claimed invention, the filter window dimensions, namely L and K, are defined as L = K = KS (k) using the filter core size estimation function calculated in step 702. However, independent dimensioning of L and K is also acceptable. This could be necessary for devices that use line-by-line memory for processing. In such cases, the radius of the filter core in the vertical direction is limited by the number of lines, and the radius of the filter core in the horizontal direction can be arbitrarily set to the desired value.

Для уменьшения вычислений фильтр диспарантности можно разделить на независимые операции в двух проходах. Первый проход - построчная обработка (Этап 703). Второй проход - обработка по столбцам (Этап 704).To reduce the calculations, the disparity filter can be divided into independent operations in two passes. The first pass is line-by-line processing (Step 703). The second pass is column processing (Step 704).

где d_rowk(x_c, y_c) - результат построчной процедуры фильтрации,where d _rowk (x _c , y _c ) is the result of a row- _wise filtering procedure,

d_k(х_с, у_с) - конечный результат фильтрации по столбцам,d _k (x _s , y _s ) - the final result of filtering by columns,

нормирующий множитель для построчного фильтра вычисляют какthe normalization factor for the line filter is calculated as

а нормирующий множитель для постолбцового фильтра вычисляют какand the normalizing factor for the column filter is calculated as

Фильтрация осуществляется в окне размером L×K. Все пиксели, которые принадлежат этой области, определяются как опорные пиксели. Пиксели в совмещаемом изображении, которые отображены векторами диспарантности из опорных пикселей, определены как совмещаемые пиксели (Фиг.8). Предложенный фильтр назначает более высокие веса для пикселей, которые более сходны с текущим пикселем. В предпочтительном варианте реализации заявляемого изобретения способ вычисления веса фильтра диспарантности использует сравнение окрестности пикселя, а не сравнение отдельных пикселей. Это делается с целью усиления критерия сходства пикселей.Filtering is performed in a window of size L × K. All pixels that belong to this area are defined as reference pixels. The pixels in the aligned image, which are displayed by disparity vectors from the reference pixels, are defined as compatible pixels (Fig. 8). The proposed filter assigns higher weights for pixels, which are more similar to the current pixel. In a preferred embodiment of the claimed invention, the method of calculating the weight of the disparity filter uses a comparison of a pixel neighborhood, rather than a comparison of individual pixels. This is done in order to strengthen the criteria for pixel similarity.

В данном способе оценки и уточнения диспарантности вес отражает степень сходства (подобия) текущего пикселя с опорным пикселем, а также степень сходства опорного пикселя с совмещаемым пикселем. Соответственно, для определения сходства текущего пикселя с опорным пикселем проводят сравнение окрестности текущего пикселя с окрестностью опорного пикселя. Затем для определения сходства опорного пикселя с совмещаемым пикселем выполняют сравнение окрестности опорного пикселя с окрестностью совмещаемого пикселя (примеры сравнений показаны на Фиг.8 стрелками). В этом случае веса фильтра диспарантности вычисляют следующим образомIn this method of assessing and refining disparity, the weight reflects the degree of similarity (similarity) of the current pixel with the reference pixel, as well as the degree of similarity of the reference pixel with a compatible pixel. Accordingly, to determine the similarity of the current pixel with the reference pixel, the neighborhood of the current pixel is compared with the neighborhood of the reference pixel. Then, to determine the similarity of the reference pixel with the compatible pixel, a comparison of the neighborhood of the reference pixel with the neighborhood of the compatible pixel is performed (examples of comparisons are shown in Fig. 8 by arrows). In this case, the weights of the disparity filter are calculated as follows

где C() обозначает функцию, используемую для сравнения окрестностей пикселя,where C () denotes a function used to compare neighborhoods of a pixel,

σ и σ_t - параметры регуляторов силы фильтра. В предпочтительном варианте реализации заявляемого изобретения они определены как σ_r=σ_t=σ(k), используя функцию адаптации силы фильтра, вычисленной на этапе 701. Однако независимая настройка σ_rи σ_t допустима тоже, что позволяет дифференцирование штрафа для пикселей из опорных и совмещаемых изображений.σ and σ _t are the parameters of the filter strength regulators. In a preferred embodiment of the claimed invention, they are defined as σ _r = σ _t = σ (k) using the filter strength adaptation function calculated in step 701. However, independent adjustment of σ _r and σ _t is also acceptable, which allows differentiation of the fine for pixels from the reference and compatible images.

Чтобы повысить производительность способа с небольшими потерями в качестве, имеет смысл использовать сравнение только опорных пикселей вместо окрестностей пикселей. Это может рассматриваться как граничный случай способа, когда размеры окрестностей пикселя, а именно, M и N равны единице. Использование этого подхода для вычисления веса фильтра может в результате давать несколько зашумленную карту диспарантности. Поэтому требуется некоторая постобработка для устранения рассогласований в карте диспарантности. С этой целью выполняют этап постобработки диспарантности (Этап 707) для конечной оценки диспарантности.To improve the performance of the method with small losses in quality, it makes sense to use a comparison of only reference pixels instead of pixel neighborhoods. This can be considered as a boundary case of the method, when the sizes of the neighborhoods of the pixel, namely, M and N, are equal to unity. Using this approach to calculate filter weight can result in a somewhat noisy disparity map. Therefore, some post-processing is required to eliminate the discrepancies in the disparity map. To this end, the disparity post-processing step is performed (Step 707) for the final disparity assessment.

Важным этапом способа уточнения диспарантности является вычисление и оценка критерия сходимости (Этап 705 - Этап 706). Во время каждой итерации с помощью способа уточнения диспарантности оценивают критерии сходимости для процесса фильтрации. В предпочтительном варианте реализации заявляемого изобретения описаны два способа оценки критерия сходимости:An important step in the method for clarifying disparity is the calculation and evaluation of the convergence criterion (Step 705 - Step 706). During each iteration, the convergence criteria for the filtering process are evaluated using the method of refining the disparity. In a preferred embodiment of the claimed invention, two methods for evaluating the convergence criterion are described:

- Используя анализ разности между смежными оценками карты диспарантности- Using the analysis of the difference between adjacent estimates of the disparity map

- Используя жесткое пороговое значение номера итерации алгоритма.- Using a hard threshold value of the iteration number of the algorithm.

Первый способ для проверки сходимости алгоритма заключается в вычислении остаточного изображения (разности) между смежными оценками карты диспарантности. Сумма остаточных пикселей не должна быть больше, чем порог T_dec1 сходимости оценки диспарантности. Это формулируется следующим образомThe first method for checking the convergence of the algorithm is to calculate the residual image (difference) between adjacent estimates of the disparity map. The sum of the residual pixels should not be greater than the threshold T _{dec1 of} convergence of the disparity estimation estimate. It is formulated as follows

,

где d_k и d_k-1это оценки диспарантности на k-ой и (k-1)-ой итерации алгоритма.where d _k and d _k-1 are disparity estimates at the kth and (k-1) -th iteration of the algorithm.

Второй способ оценки критерия сходимости можно сформулировать как жесткое пороговое значение числа итераций. Если число итераций превышает порог T_dec2 сходимости оценки диспарантности, то процесс фильтрации прекращают. Это формулируется следующим образомThe second method for evaluating the convergence criterion can be formulated as a hard threshold value of the number of iterations. If the number of iterations exceeds the threshold T _{dec2 of the} convergence of the disparity estimation, then the filtering process is stopped. It is formulated as follows

k≥T_dec2, k≥T _dec2,

где k - номер текущей итерации оценки диспарантности.where k is the number of the current iteration of the disparity assessment.

Первый критерий сходимости можно использовать для жесткой оценки результатов выполнения алгоритма. В то время, как использование второго критерия предпочтительно для устройств с минимальной конфигурацией, где простота устройства играет важную роль.The first criterion for convergence can be used to rigorously evaluate the results of the algorithm. While the use of the second criterion is preferable for devices with a minimal configuration, where the simplicity of the device plays an important role.

Конечный этап способа уточнения диспарантности заключается в выполнении постобработки диспарантности (Этап 707). В предпочтительном варианте реализации заявляемого изобретения используется медианный фильтр в качестве средства постобработки, что является простым и надежным решением для устранения импульсного шума, а также небольших искажений карты диспарантности.The final step in the method for refining disparity is to perform the post-processing of disparity (Step 707). In a preferred embodiment of the claimed invention, a median filter is used as a post-processing means, which is a simple and reliable solution for eliminating impulse noise, as well as small distortions of the disparity map.

Фиг.9 показывает результаты оценки диспарантности предложенного фильтра согласно изобретению. Левое изображение в стереопаре, для которой была выполнена оценка диспарантности, представлено на Фиг.9 (вид 9.1). На Фиг.9 (вид 9.2) представлен результат оценки диспарантности известного из уровня техники фильтра [4], а на Фиг.9 (вид 9.3) представлен результат оценки, диспарантности предложенного фильтра согласно изобретению. Из Фиг.9 (вид 9.2) и (вид 9.3) видно, что заявляемый способ дает лучший результат, особенно для однородных (гладких) участков изображения и участков с периодической структурой. В то же время, эффективность способа сопоставима с эффективностью известного фильтра [4].Fig.9 shows the results of the assessment of the disparity of the proposed filter according to the invention. The left image in a stereo pair for which a disparity assessment was performed is shown in Fig. 9 (view 9.1). Figure 9 (view 9.2) presents the result of the assessment of the disparity of the prior art filter [4], and Figure 9 (view 9.3) presents the result of the assessment of the disparity of the proposed filter according to the invention. From Fig.9 (view 9.2) and (view 9.3) it can be seen that the inventive method gives the best result, especially for homogeneous (smooth) portions of the image and areas with a periodic structure. At the same time, the effectiveness of the method is comparable with the efficiency of the known filter [4].

Фиг.10 демонстрирует детализированную блок-схему системы параллельного вычисления, используемого для генерации множества изображений (видов) согласно данному варианту реализации заявляемого изобретения. Входные данные для генерации множества изображений формируются из опорного цветного изображения 1001, карты 1002 диспарантности и параметров камеры 1003. На выходе системы синтезируются N изображений 1010.Figure 10 shows a detailed block diagram of a parallel computing system used to generate multiple images (views) according to this embodiment of the claimed invention. Input data for generating a plurality of images is generated from the reference color image 1001, the disparity card 1002 and the camera parameters 1003. At the system output, N images 1010 are synthesized.

Способ генерации виртуальных изображений согласно заявляемому изобретению разработан таким образом, что любая строка из N виртуальных изображений может быть синтезирована независимо от других строк. Такое важное свойство может использоваться вычислительным устройством 1006 с большим числом модулей, осуществляющих параллельные вычисления для существенного ускорения синтеза виртуальных изображений. Вычислительное устройство 1006 работает в соответствии с программой 1012 для генерации множества изображений, которые записаны в блок 1011 памяти программ. Программа 1012 соответствует способу генерации виртуальных изображений, который детально описан далее.The method of generating virtual images according to the claimed invention is designed in such a way that any line of N virtual images can be synthesized independently of other lines. Such an important property can be used by computing device 1006 with a large number of modules performing parallel computing to significantly accelerate the synthesis of virtual images. Computing device 1006 operates in accordance with a program 1012 for generating a plurality of images that are recorded in a program memory 1011. Program 1012 corresponds to a method for generating virtual images, which is described in detail below.

Модуль 1004 расщепления изображения на строки разбивает входящие изображения 1001, 1002 на множества строк, которые могут обрабатываться независимо, при этом такие множества строк сохраняют в буфере 1005 памяти данных. Следует учитывать, что множество строк состоит из горизонтальных строк опорного изображения. Число К множеств строк зависит от размера изображений и может быть равным высоте опорного изображения. Для устройств с минимальной конфигурацией коэффициент К может быть меньше, чем высота опорного изображения. В таком случае, опорное изображение должно загружаться в блок 1005 памяти данных через отдельный загузчик данных.Module 1004 splitting the image into lines splits the incoming image 1001, 1002 into sets of lines that can be processed independently, while such many lines are stored in the buffer 1005 of the data memory. Keep in mind that many lines consist of horizontal lines of the reference image. The number K of row sets depends on the size of the images and may be equal to the height of the reference image. For devices with a minimal configuration, the K coefficient may be less than the height of the reference image. In such a case, the reference image should be loaded into the data storage unit 1005 via a separate data loader.

Вычислительное устройство 1006 включает в себя несколько вычислительных модулей, каждый из которых обрабатывает назначенный ему набор из множества строк параллельно с другими модулями и генерирует соответствующую строку из N изображений. Во время обработки строк опорного изображения один вычислительный модуль вычислительного устройства 1006 генерирует N соответствующих строк виртуальных изображений. Иными словами, вычислительный модуль генерирует соответствующую строку виртуального изображения для каждого виртуального изображения. Таким образом, буфер 1008 памяти данных должен быть способен записать К·Width·N значений, где N - число виртуальных изображений, Width - ширина опорного изображения, K - число строк, доступных в буфере 1005 памяти данных. После того, как все строки из N виртуальных изображений будут сгенерированы, модуль 1009 совмещения строк сгенерирует N изображений 1010 из строк изображений, хранящихся в буфере 1008 памяти данных.Computing device 1006 includes several computing modules, each of which processes a set of multiple lines assigned to it in parallel with other modules and generates a corresponding line of N images. During processing of the lines of the reference image, one computing module of the computing device 1006 generates N corresponding lines of virtual images. In other words, the computing module generates a corresponding line of the virtual image for each virtual image. Thus, the data memory buffer 1008 must be able to write K · Width · N values, where N is the number of virtual images, Width is the width of the reference image, K is the number of lines available in the data memory buffer 1005. After all the lines of N virtual images have been generated, the line combining unit 1009 will generate N images 1010 from the image lines stored in the data memory buffer 1008.

Следует учитывать, что число вычислительных модулей может не соответствовать числу наборов данных. Чтобы устранить эту проблему, модуль 1007 планирования вычислений приводит в соответствие число строк, сгенерированных модулем 1004 расщепления изображения на строки, с числом вычислительных модулей, свободных для использования при вычислениях. После того, как вычислительный модуль приступает к обработке данных, он отмечается как "занятый". Когда строка из N изображений сформирована, вычислительный модуль маркируется как "свободный". Модуль 1007 планирования вычислений также управляет модулем 1009 совмещения строк, посылая управляющий сигнал, когда новые строки множества изображений готовы.It should be noted that the number of computing modules may not correspond to the number of data sets. To solve this problem, the calculation planning module 1007 maps the number of lines generated by the image splitting module 1004 to the number of computing modules free to use in the calculations. After the computing module starts processing the data, it is marked as "busy". When a string of N images is formed, the computing module is marked as “free”. The calculation scheduling unit 1007 also controls the line combining unit 1009, sending a control signal when new lines of the plurality of images are ready.

Рассмотрим поэтапное функционирование способа генерации виртуальных изображений на базе опорного изображения, полученного от видеокамеры, и соответствующей карты диспарантности (Фиг.11). Для нормальной обработки способ требует наличия данных о параметрах видеокамеры, а именно, параметры проекции камеры, а также параметры ориентации камеры в пространстве, которые являются существенными для симуляции виртуальной камеры. Эти параметры могут быть заранее определены и занесены в ROM. Другой возможностью является передача этих параметров совместно с изображением и данными по диспарантности. Первый этап способа генерации виртуального изображения состоит в задании числа виртуальных изображений, которые должны быть сгенерированы (Этап 1101). Число изображений может быть строго определено и занесено в ROM. Это может быть востребовано в ситуации, когда заранее известны данные об устройстве воспроизведения, а также о числе изображений, которые должны быть сгенерированы. Однако это требует перепрограммирования ROM в соответствии с новым числом изображений при смене устройства воспроизведения. Можно также записать эти виртуальные изображения в RAM. В этом случае должны быть предоставлены данные о числе виртуальных изображений и о параметрах видеокамеры.Consider the phased operation of the method of generating virtual images based on the reference image received from the video camera and the corresponding disparity card (Figure 11). For normal processing, the method requires the availability of data on the parameters of the video camera, namely, the projection parameters of the camera, as well as the orientation parameters of the camera in space, which are essential for simulating a virtual camera. These parameters can be predefined and stored in ROM. Another possibility is the transfer of these parameters together with the image and disparity data. The first step of the virtual image generation method is to set the number of virtual images to be generated (Step 1101). The number of images can be strictly determined and recorded in ROM. This can be claimed in a situation where data on the playback device, as well as on the number of images to be generated, are known in advance. However, this requires reprogramming the ROM in accordance with the new number of images when changing the playback device. You can also write these virtual images to RAM. In this case, data should be provided on the number of virtual images and on the parameters of the video camera.

Из уровня техники известно, что проекционное преобразование реальной камеры описывается проекционной матрицей M, которая состоит из двух матриц, матрица проекции K и матрица ориентации камеры в пространстве [R|t]. Где K определяет проекционное правило, то есть, каким образом трехмерная точка реального мира отображается на плоскости изображения внутри камеры, в то время как матрица [R|t] ответственна за описание положения камеры относительно начала мировой системы координат. В дальнейшем определяем проекционную матрицу M камеры какIt is known from the prior art that the projection transformation of a real camera is described by a projection matrix M, which consists of two matrices, a projection matrix K and a camera orientation matrix in space [R | t]. Where K defines the projection rule, that is, how the three-dimensional point of the real world is displayed on the image plane inside the camera, while the matrix [R | t] is responsible for describing the position of the camera relative to the origin of the world coordinate system. In the future, we define the projection matrix M of the camera as

M=K·[R|t],M = K · [R | t],

где K - матрица проекции,where K is the projection matrix,

[R|t] - матрица ориентации камеры в пространстве.[R | t] is the matrix of the orientation of the camera in space.

Матрица проекции K определяется какThe projection matrix K is defined as

_,

где (f_x, f_y) - фокусное расстояние для моделирования прямоугольного фоточувствительного элемента,where (f _x , f _y ) is the focal length for modeling a rectangular photosensitive element,

(x₀, y₀) - основная точка, координата центра изображения.(x ₀ , y ₀ ) - the main point, the coordinate of the center of the image.

Матрица ориентации камеры в пространстве [R|t] определяется как

,The camera orientation matrix in space [R | t] is defined as

,

где R - матрица вращения, описывающая ориентацию камеры по отношению к началу мировой системы координат;where R is the rotation matrix describing the orientation of the camera with respect to the origin of the world coordinate system;

t - вектор сдвига от начала мировых координат.t is the shift vector from the origin of world coordinates.

Поскольку цель заключается в генерации набора изображений для последовательного выведения на устройство воспроизведения потока изображений, имеется возможность упростить исходные данные, требуемые для нормальной обработки. Следующие параметры необходимы для успешной генерации множества изображений.Since the goal is to generate a set of images for sequentially outputting the image stream to the reproducing apparatus, it is possible to simplify the initial data required for normal processing. The following parameters are necessary for the successful generation of multiple images.

1. Матрица K, присущая видео камере.1. The matrix K inherent in the video camera.

2. Расстояние b между смежными виртуальными изображениями.2. The distance b between adjacent virtual images.

3. Число n изображений (ракурсов), расположенных на стороне реальной камеры (левая или правая сторона).3. The number n of images (angles) located on the side of the real camera (left or right side).

Конфигурация виртуальных и реальных камер показана на Фиг.12. Если n является числом изображений с одной стороны реальной видеокамеры, то общее число изображений можно выразить как N=2·n+1. Без потери общности можем условно закрепить начало мировой системы координат в центре реальной камеры (C₀ на Фиг.12).The configuration of virtual and real cameras is shown in Fig. 12. If n is the number of images on one side of a real video camera, then the total number of images can be expressed as N = 2 · n + 1. Without loss of generality, we can arbitrarily fix the beginning of the world coordinate system in the center of a real camera (C ₀ in FIG. 12).

Тогда матрица вращения сведется к единичной матрице:Then the rotation matrix is reduced to the identity matrix:

С другой стороны, подразумевается, что оптические центры всех виртуальных камер должны лежать на одной горизонтальной прямой. Поэтому вектор сдвига для камер, которые находятся по правую руку от реальной камеры, можно сформулировать как:On the other hand, it is understood that the optical centers of all virtual cameras must lie on one horizontal line. Therefore, the shift vector for cameras that are located on the right hand of a real camera can be formulated as:

где i - это число виртуальных изображений, расположенных по правую руку от реальной камеры. Следовательно, вектор сдвига для камер, которые находится по левую руку от реальной камеры, можно сформулировать как:where i is the number of virtual images located on the right hand of a real camera. Therefore, the shift vector for cameras that are located on the left hand of a real camera can be formulated as:

где i - это число виртуальных изображений, расположенных по левую руку от реальной камеры. В результате, матрица [R|t] представляется как:where i is the number of virtual images located on the left hand of the real camera. As a result, the matrix [R | t] is represented as:

Следующий этап реализации способа, генерации виртуальных изображений заключается в моделировании виртуальных камер по числу изображений (Этап 1102). Главная цель моделирования виртуальных камер состоит в устанавлении соответствия между реальной камерой и виртуальной камерой. Это позволяет корректно подавать виртуальные изображения на любые устройства, которые поддерживают обычные проекционные уравнения. Моделирование виртуальной камеры заключается в построении проекционной матрицы P на основе матрицы K реальной камеры, а также в построении матрицы наблюдения V на основе матрицы [R|t] реальной камеры. Преимущество такого описания виртуальной камеры заключается в возможности совмещения в автоматическом режиме реального и синтезированного контентов.The next step in implementing the method of generating virtual images is to simulate virtual cameras by the number of images (Step 1102). The main goal of modeling virtual cameras is to establish a correspondence between a real camera and a virtual camera. This allows you to correctly submit virtual images to any device that supports the usual projection equations. Modeling a virtual camera consists in constructing a projection matrix P based on the matrix K of the real camera, as well as constructing a surveillance matrix V based on the matrix [R | t] of the real camera. The advantage of this description of a virtual camera is the ability to combine real and synthesized content in automatic mode.

Рассмотрим процесс построения матриц P и V в подробностях.Consider the process of constructing the matrices P and V in detail.

Матрицы P и V следует рассматривать как аналоги матриц K и [R|t]. Назначение матрицы P состоит в том, чтобы описать преобразование из трехмерного изображения в двумерное изображение. Другими словами, P описывает проективное преобразование трехмерного набора точек, содержавшегося в усеченном параллельными плоскостями изображении с виртуальной камеры в набор двумерных точек. Одно важное различие между матрицей P и матрицей [R|t] заключается в возможности сохранения не только двумерных координат, как в [R|t], но и сохранять z - координату трехмерной точки, таким образом, делая возможным интегрирование с синтезированным контентом или слияние различных реальных изображений.The matrices P and V should be considered as analogues of the matrices K and [R | t]. The purpose of the matrix P is to describe the conversion from a three-dimensional image to a two-dimensional image. In other words, P describes the projective transformation of a three-dimensional set of points contained in a virtual camera-truncated image from a virtual camera into a set of two-dimensional points. One important difference between the matrix P and the matrix [R | t] is the ability to save not only two-dimensional coordinates, as in [R | t], but also to save the z-coordinate of a three-dimensional point, thus making integration with synthesized content or merging possible various real images.

где N и F - передний и задний планы конуса обзора виртуальной камеры. Конус обзора показан на Фиг.13 (вид 13.1).where N and F are the foreground and background of the cone of view of the virtual camera. The viewing cone is shown in FIG. 13 (view 13.1).

L и R - координаты левой и правой границ плоскости виртуального изображения (Фиг.13 (13.2).L and R are the coordinates of the left and right boundaries of the plane of the virtual image (Fig.13 (13.2).

T и B - координаты верхней и нижней границы плоскости виртуального изображения (Фиг.13 (13.2).T and B are the coordinates of the upper and lower boundaries of the plane of the virtual image (Fig.13 (13.2).

Необходимо определить параметры матрицы P, соответствующие параметрам матрицы K. Для этого определяют уравнения:It is necessary to determine the parameters of the matrix P corresponding to the parameters of the matrix K. For this, the equations are determined:

,

где W - ширина плоскости изображения;where W is the width of the image plane;

H - высота плоскости изображения.H is the height of the image plane.

Используя вышеупомянутые уравнения и устанавив соответствующие значения для N и F, можно вывести 1 проекционную матрицу P, соответствующую матрице K. Теперь можно вывести матрицу наблюдения V виртуальных изображений из матрицы [R|t] видеокамеры. Назначение матрицы наблюдения V совпадает с назначением матрицы [R|t] - то есть обе матрицы используются для описания места расположения камеры по отношению к нулевой точке мировой системы координат. Матрица V определяется следующим образом:Using the above equations and setting the corresponding values for N and F, we can derive 1 projection matrix P corresponding to the matrix K. Now we can derive the observation matrix V of virtual images from the matrix [R | t] of the camera. The purpose of the observation matrix V coincides with the purpose of the matrix [R | t] - that is, both matrices are used to describe the location of the camera with respect to the zero point of the world coordinate system. Matrix V is defined as follows:

.

Отсюда выводится полное проекционное уравнение:From here the complete projection equation is derived:

G=P·V,G = P · V,

где G - является полной проекционной матрицей виртуальной камеры.where G - is the full projection matrix of the virtual camera.

После того, как соответствующая G-матрица будет выведена для каждой виртуальной камеры, можно запустить генерацию множества изображений в соответствии с функцией формирования виртуальных изображений (Этап 1104). Функция формирования виртуальных изображений предназначена для создания множественных виртульных видов (изображений), полученных из изображения, захваченного реальной камерой, и карты глубины (диспарантности), наряду с предварительно вычисленными проекционными матрицами G_N виртуальных камер. В дальнейшем G_N описывается как множество, содержащее матрицу G для каждой виртуальной камеры. Чтобы сгенерировать виртуальное изображение из реального изображения и соответсвуещей ему карты глубины, необходимо повторно спроецировать пиксель изображения назад в трехмерное пространство, и затем спроецировать реконструированные трехмерные точки в направлении виртуального вида, описанное его проекционной матрицей G (Фиг.14). Повторное проецирование пикселя в трехмерное пространство может быть сформулировано, какAfter the corresponding G-matrix is output for each virtual camera, it is possible to start generating a plurality of images in accordance with the virtual imaging function (Step 1104). The virtual image generation function is designed to create multiple virtual views (images) obtained from an image captured by a real camera and a depth map (disparity), along with previously calculated projection matrices G _{N of} virtual cameras. Hereinafter, G _{N is} described as a set containing a matrix G for each virtual camera. In order to generate a virtual image from the real image and the corresponding depth map, it is necessary to re-project the image pixel back into three-dimensional space, and then project the reconstructed three-dimensional points in the direction of the virtual view described by its projection matrix G (Fig. 14). Re-projecting a pixel into three-dimensional space can be formulated as

где x - вектор, описывающий единообразные координаты пикселя (x, y, 1);where x is a vector describing the uniform pixel coordinates (x, y, 1);

X - вектор, описывающий единообразные координаты (X, Y, Z, 1) трехмерную точку для пикселя x;X is a vector describing the uniform coordinates (X, Y, Z, 1) of a three-dimensional point for pixel x;

z - скаляр, представляющий значение глубины из карты глубины, соответствующей пикселю x;z is a scalar representing the depth value from the depth map corresponding to pixel x;

M⁺- псевдо инверсия проекционной матрицы M видеокамеры, так что MM⁺=I;M ⁺ is the pseudo inversion of the projection matrix M of the video camera, so MM ⁺ = I;

C - вектор, представляющий координаты центра камеры. Учитывая предыдущие соображения о расположении виртуальных камер в пространстве, вектор описывается как (t_x, 0, 0, 1).C is a vector representing the coordinates of the center of the camera. Given the previous considerations about the location of virtual cameras in space, the vector is described as (t _x , 0, 0, 1).

Для каждого пикселя в опорном изображении необходимо реконструировать трехмерную точку X, используя уравнение (1). После этого реконструированное облако точек должно быть спроецировано на плоскость проекции виртуальной камеры. Проекция из трехмерного пространства в двумерное выполняется следующим образом:For each pixel in the reference image, it is necessary to reconstruct the three-dimensional point X using equation (1). After that, the reconstructed point cloud should be projected onto the projection plane of the virtual camera. The projection from three-dimensional space into two-dimensional is as follows:

где G - матрица проекции виртуальной камеры, определенная ранее; X - вектор, описывающий трехмерную точку, соответствующую опорному цветному изображению;where G is the projection matrix of the virtual camera, as defined previously; X is a vector describing a three-dimensional point corresponding to a reference color image;

Y - вектор, представляющий однородные координаты двумерной точки, спроецированной на плоскость виртуальной камеры. Наряду с двумерными координатами сохраняют z-координату трехмерной точки X. Вектор описывается следующим образом (x_v, y_v, z_v, 1).Y is a vector representing the homogeneous coordinates of a two-dimensional point projected onto the plane of the virtual camera. Along with two-dimensional coordinates, the z-coordinate of the three-dimensional point X is preserved. The vector is described as follows (x _v , y _v , z _v , 1).

Теперь, можно повторно сформулировать уравнения (1) и (2) для получения более компактной формулировки функции формирования виртуальных изображений. Для этого следует заменить вектор X в уравнении (2) на его определение из уравнения (1) и получить после открытия скобок следующее выражениеNow, it is possible to re-formulate equations (1) and (2) to obtain a more compact formulation of the virtual imaging function. To do this, replace the vector X in equation (2) with its definition from equation (1) and obtain the following expression after opening the brackets

Теперь, можно разделить уравнение (3) на части, зависящие от входных данных (то есть значение глубины и координат изображения), и части, которые являются независимыми от данных и, следовательно, могут быть заранее вычислены на этапе оценки параметров функции формирования виртуальных изображений (Этап 1103). Независимые от данных параметры включают в себя скалярное произведение матриц G и С, которое определяется как вектор A (A=G·C). Второй параметр, который можно вычислить заранее, является скалярным произведением матриц G и М⁺. Результирующая матрица В определяется как B=G·M⁺. Теперь можно повторно сформулировать уравнение (3) следующим образомNow, it is possible to divide equation (3) into parts that depend on the input data (i.e., the value of the depth and image coordinates) and parts that are independent of the data and, therefore, can be pre-calculated at the stage of evaluating the parameters of the virtual image formation function ( Step 1103). Parameters independent of the data include the scalar product of the matrices G and C, which is defined as the vector A (A = G · C). The second parameter, which can be calculated in advance, is the scalar product of the matrices G and M ⁺ . The resulting matrix B is defined as B = G · M ⁺ . Now we can re-formulate equation (3) as follows

Уравнение (4) представляет функцию формирования виртуальных изображений согласно описанному способу генерации виртуальных изображений. Используя уравнение (4), виртуальное изображение, связанное с конкретной виртуальной камерой, может быть легко вычислено путем переноса изображения и данных о глубине на координаты, представленные через х. Для всех виртуальных камер N матрицы A и B должны быть вычислены и занесены в память. Затем алгоритм для генерации виртуального изображения синтезирует виртуальные изображения согласно параметрам камеры, начиная с A₁ и B₁, и кончая A_N и B_N.Equation (4) represents the function of generating virtual images according to the described method for generating virtual images. Using equation (4), a virtual image associated with a specific virtual camera can be easily calculated by transferring the image and depth data to the coordinates represented by x. For all virtual cameras N, matrices A and B must be calculated and stored. Then, the algorithm for generating a virtual image synthesizes virtual images according to camera parameters, starting with A ₁ and B ₁ , and ending with A _N and B _N.

Как только все необходимые виртуальные изображения будут синтезированы, способ генерации виртуальных изображений осуществляет обработку дисокклюзий (Этап 1105). Иллюстрация дисокклюзии приведена на Фиг.14. На этом чертеже области дисокклюзий представлены черным цветом. Дисокклюзия - это область на виртуальном изображении, которая становится видимой с виртуальной точки зрения, в противоположность опорному изображению, где эта область заслонена каким-то предметом. Поскольку для корректной обработки дисокклюзий требуется дополнительная информация, например, другое изображение сцены, зафиксированное с другой точки съемки, для заполнения появившихся "дыр". Однако, когда виртуальные изображения не разнесены далеко от опорного изображения, вполне приемлемые результаты дает интерполяция от граничных пикселей области дисокклюзии. В процессе генерации виртуальных изображений области дисокклюзии можно маркировать цифровым знаком для дальнейшей обработки.Once all the necessary virtual images have been synthesized, the virtual image generation method performs the dislocation processing (Step 1105). An illustration of disclusion is shown in Fig. 14. In this drawing, the areas of disclusion are shown in black. Disclusion is an area in a virtual image that becomes visible from a virtual point of view, as opposed to a reference image, where this area is obscured by some kind of object. Since the correct processing of dislocations requires additional information, for example, another image of the scene, captured from a different shooting point, to fill the "holes" that have appeared. However, when the virtual images are not spaced far from the reference image, interpolation from the boundary pixels of the dislocation region gives quite acceptable results. In the process of generating virtual images, the dislocation area can be marked with a digital sign for further processing.

Фильтр дисокклюзии формулируется как особый случай билатерального фильтра, когда веса фильтра вычисляются только для пикселей, которые лежат вне области дисокклюзии. Результат суммирования весов приписывают пикселям, которые лежат в области дисокклюзии. Такой вид фильтра распространяет информацию из определенных областей изображения на неопределенные области изображения (disocclusions). Фильтр может применяться многократно с возрастающим радиусом для итерационного заполнения в области дисокклюзии. Фильтр дисокклюзии для пикселя I (x, y) в прямоугольном окне с размером K в горизонтальном направлении и размер L в вертикальном направлении формулируется какA disclusion filter is formulated as a special case of a bilateral filter, when filter weights are calculated only for pixels that lie outside the disclusion region. The result of summing the weights is attributed to the pixels that lie in the area of disclusion. This type of filter distributes information from certain areas of the image to undefined areas of the image (disocclusions). The filter can be used repeatedly with increasing radius for iterative filling in the area of disclusion. A disruption filter for a pixel I (x, y) in a rectangular window with size K in the horizontal direction and size L in the vertical direction is formulated as

где I(x+i, y+j) - исходный пиксель изображения;where I (x + i, y + j) is the original image pixel;

I^∈(x, y) отфильтрованный пиксель;I ^∈ (x, y) filtered pixel;

W(x+i, y+j) вес пикселя I(x+i, y+j). Вес W сформулирован следующим образомW (x + i, y + j) is the weight of the pixel I (x + i, y + j). Weight W is formulated as follows

где ΔС_pq - расстояние между точками p(x, y) и q(x+i, y+j) в цветовом пространстве, вычисленное как эвклидово расстояние; ^ΔJ _pq- эвклидово расстояние между точками p и q области изображения, c₁ и c₂ - некоторые предопределенные значения, относящиеся к эффекту сглаживания и уточнения. Согласно уравнению (6), если интенсивность исходного пикселя отмечена как дисокклюзия I(x+i,y+j)=mark, то соответствующий вес принимает нулевое значение. Из этого следует, что пиксели дисокклюзии не будут участвовать в процессе фильтрования, определенном уравнением (5) и уточненное изображение I^∈(x, y) будет формироваться только с помощью достоверных пикселей, взвешенных по цветовому сходству и пространственному расстоянию.where ΔС _pq is the distance between the points p (x, y) and q (x + i, y + j) in the color space, calculated as the Euclidean distance; ^ΔJ _pq is the Euclidean distance between the points p and q of the image area, c ₁ and c ₂ are some predetermined values related to the smoothing and refinement effect. According to equation (6), if the intensity of the original pixel is marked as a dislocation I (x + i, y + j) = mark, then the corresponding weight assumes a zero value. It follows that the disclosure pixels will not participate in the filtering process defined by equation (5) and the refined image I ^∈ (x, y) will be formed only with the help of reliable pixels weighted by color similarity and spatial distance.

Для областей дисокклюзии с большими размерами можно применить фильтр дисокклюзии согласно уравнению (5) несколько раз с переменным размером ядра фильтра. Переменный размер ядра для фильтра дисокклюзии DKS (k), который адаптивен к номеру к итерации, определяется какFor large areas of disclusion, you can apply the disclusion filter according to equation (5) several times with a variable filter core size. The variable kernel size for the DKS (k) disclosure filter, which is adaptive to the iteration number, is defined as

DKS(k)=a_d·k+b_d, _{DKS (k) = a d ·} k + b d,

где k - число итераций,where k is the number of iterations,

a_d, b_d - линейные коэффициенты.a _d , b _d - linear coefficients.

Допустимо также применение другого вида функции DKS(k). Но это не влияет на объем охраны по заявляемому изобретению.It is also possible to use another form of the function DKS (k). But this does not affect the scope of protection of the claimed invention.

Другое улучшение фильтра дисокклюзии включает возможность неявного использования рекурсии, когда необходим только один буфер для ввода множества I, и вывода множества I^∈. В этом случае результат фильтрации пикселя I (x-1, y-1) дисокклюзии можно использовать для фильтрации следующего пикселя I (x, y) дисокклюзии. Это может значительно сократить число итераций.Another improvement in the disclusion filter includes the ability to implicitly use recursion when only one buffer is needed to enter the set I and output the set I ^∈ . In this case, the filtering result of the pixel I (x-1, y-1) of the disclusion can be used to filter the next pixel I (x, y) of the disclosure. This can significantly reduce the number of iterations.

Третье улучшение фильтра дисокклюзии состоит в отдельном вычислении фильтра, когда обрабатывают строки изображения, а затем столбцы. Отделяемый фильтр определяется какA third improvement in the disruption filter is to separately calculate the filter when image rows and then columns are processed. A separable filter is defined as

где I_row (x, y) означает результат фильтрации, выполненной построчно;where I _row (x, y) means the result of filtering performed line by line;

I^∈ _sep(x, y) - конечный результат фильтрации, выполненной по столбцам изображения.I ^∈ _sep (x, y) is the final result of filtering performed on image columns.

ССЫЛКИLINKS

[1] US patent application № 2004/0032980 P.V.Harman "Image Conversion and Encoding Techniques".[1] US patent application No. 2004/0032980 P.V. Harman "Image Conversion and Encoding Techniques".

[2] US patent application № 2003/0007681 R.G.Baker "Image Conversion and Encoding Techniques".[2] US patent application No. 2003/0007681 R.G. Baker "Image Conversion and Encoding Techniques".

[3] US patent application № 2005/0185711, H.Pfister, W.Matusik "3D television system and method".[3] US patent application No. 2005/0185711, H. Pfister, W. Matusik "3D television system and method".

[4] WO patent application № 2008/041167, F.Boughorbel, "Method and Filter for Recovery of Disparities in a Video Stream".[4] WO patent application No. 2008/041167, F. Boughorbel, "Method and Filter for Recovery of Disparities in a Video Stream".

Claims

1. The system for the formation and playback of three-dimensional video images, consisting of:
formation complex, including:
more than one video stream generation device;
means for storing and transmitting video streams over the network to the modules for extracting stereo frames and evaluating disparity;
a generating module configured to extract stereo frames from a plurality of video streams;
a module for pre-processing stereo frames, configured to extract multiple projections of a three-dimensional scene from stereo frames and perform procedures for compensating for lens distortion and image alignment using an epipolar constraint;
the disparity assessment complex, which includes:
a color converter configured to convert images into a visually uniform color space:
a first disparity module configured to approximate disparity on a plurality of projections of a three-dimensional scene;
a second disparity module configured to re-refine the approximate disparity map;
a transmission unit configured to pack the disparity map and projections into three-dimensional video frames and, after optional compression, transfer them to a three-dimensional playback device;
a complex of image synthesis, including:
a receiving unit configured to receive and decompress three-dimensional video frames;
a synthesis unit configured to generate a plurality of virtual views of the observed three-dimensional scene for playback on a three-dimensional display;
playback complex, including:
a digital display with an optical system, configured to spatially separate many virtual views, so that each eye observes its own kind of reproduced three-dimensional scene;
a multiplexer configured to prepare a multiplexed image in accordance with the characteristics of an optical system of a three-dimensional display and a plurality of virtual views.

2. The system according to claim 1, characterized in that the device for generating video images are an arbitrary set of video cameras observing the same region of space, configured to form multiple video streams of a three-dimensional dynamic scene; means for transmitting stereo images over a network of a storage device configured to store stereo images in the form of video files.

3. The system according to claim 1, characterized in that the digital display is equipped with an optical system configured to implement the principles of obtaining the effect of a three-dimensional image through the use of a lens-raster sheet or parallax barrier.

4. The system according to claim 1, characterized in that the plurality of video images are stereo pairs of a dynamic scene.

5. The system according to claim 1, characterized in that it comprises means for synchronizing multiple cameras.

6. The system according to claim 1, characterized in that the color converter is configured to convert RGB images into a Lab color space.

7. The system according to claim 1, characterized in that the first disparity module is configured to initialize the disparity card through the stereo-matching procedure.

8. The system according to claim 1, characterized in that the first disparity module is configured to initialize the disparity card with arbitrary values.

9. The system according to claim 1, characterized in that for each projection of a three-dimensional scene, a separate disparity map is provided.

10. The system according to claim 1, characterized in that the storage means includes the ability to store video files consisting of disparity cards, grouped with the corresponding projection of a three-dimensional scene.

11. The system according to claim 1, characterized in that the disparity assessment blocks and the playback blocks are made in the form of a computing device consisting of mass parallelism calculation blocks, each of which is designed for parallel data processing in accordance with the procedure recorded in the program memory.

12. The system according to claim 11, characterized in that the computing device is configured to clarify disparity values for each individual line of the disparity card independently of other lines of the card.

13. The system of claim 12, wherein the line on the disparity map is both an image row and an image column.

14. The system of claim 1, wherein the stereo frame pre-processing module is configured to automatically determine the orientation of the stereo frame based on the aspect ratio of the stereo frame.

15. A method for evaluating and clarifying disparity values for a stereo pair, which consists in carrying out successive iterations of the refinement of the initial disparity assessment and providing for the following operations:
perform a primary calculation of disparity;
adjusting the parameters of the disparity refinement filter depending on the iteration number;
evaluate the size of the core of the disparity refinement filter depending on the iteration number;
refine the disparity map according to the parameters and dimensions of the filter core;
convergence criteria are calculated and evaluated, and if they are met, the filtering process is stopped;
perform post-processing of the disparity card.

16. The method according to clause 15, wherein the original stereo image is converted from RGB color space to Lab color space.

17. The method according to clause 15, wherein the initial disparity is represented as a noise image, the image being represented as a grayscale image, and the pixel intensity in the image is set randomly in the range from the minimum pixel intensity to the maximum pixel intensity, which for 8 -bit images corresponds to an intensity range from 0 to 255.

18. The method according to p. 15, characterized in that the adjustment of the disparity filter in accordance with the iteration is performed according to the formula σ (k) = a ₁ · k · b ₁ ,
where k is the iteration number; a ₁ , b ₁ - linear coefficients.

19. The method according to clause 15, wherein the evaluation of the size of the filter cell in accordance with the iteration is performed according to the formula
KS (k) = a ₂ · k + b _2,
where k is the iteration number; a ₂ , b ₂ - linear coefficients.

20. The method according to p. 15, characterized in that the disparity filtering at the k-th iteration is performed according to the formula

where d _k (x _c , y _s ) denotes the disparity map at the kth iteration for the current pixel with coordinates (x _s , y _s );
d _k-1 (x _r , y _r ) denotes the disparity map at the (k-1) th iteration for the reference pixel with coordinates (x _r = x _c + p, y _r = y _c + s),
w _r denotes the weight of the reference pixel,
index p varies from

before

in the direction of X,
index s varies from

before

in the direction of Y
the normalization factor is calculated as

.

21. The method according to claim 20, characterized in that the disparity filtering is performed in stages, and at the first stage, line-by-line image processing is performed, and at the second stage, the processing is performed in columns, namely

where d _rowk (x _c , y _c ) is the result of a row- _wise filtering procedure;
d _k (x _s , y _s ) - the final result of filtering by columns,
the normalization factor for the line filter is calculated as

and the normalizing factor for the column filter is calculated as

22. The method according to any one of paragraphs.20 and 21, characterized in that the weight of the disparity refinement filter is calculated as follows

where C () denotes a function used to compare the environment of a pixel;
σ _r is the penalty weight parameter for the reference pixel in the reference image;
σ _t is the penalty weight parameter for the combined pixel in the combined image;
(x _r , y _r ) - coordinates of the reference pixel;
(x _t , y _t ) - coordinates of the combined pixel.

23. The method according to item 22, wherein the function used to compare the environment of the pixel is defined as

where I _c (x _c , y _c ) denotes the intensity of the current pixel with coordinates x _c and y _c ;
I _r (x _r , y _r ) denotes the intensity of the reference pixel with coordinates x _r and y _r ;
index i varies from

before

in the direction of X;
index j varies from

before

in the direction of Y;
σ _n is a Gaussian parameter for controlling the weight of the pixel in accordance with its position relative to the position of the current pixel, and the pixels distant from the central pixel are assigned lower weights than the pixels closer to the central pixel, which are assigned higher weight values, while normalizing the multiplier is calculated as

24. The method according to clause 15, wherein the evaluation of the convergence criterion is performed according to the formula

where d _k and d _k-1 denote disparity estimates at the k-th and (k-1) -th iterations of the algorithm;
T _dec1 is the threshold value for the convergence of the disparity assessment.

25. The method according to clause 15, wherein the convergence criterion is evaluated according to the formula k≥T _dec2 , where k is the sequence number of the current iteration of the disparity assessment; T _dec2 is the threshold value for the convergence of the disparity assessment.

26. The method according to clause 15, wherein the disparity processing post-processing is performed using a median filter.

27. A method of generating virtual images from a color image and the corresponding disparity / depth map with known camera parameters, consisting of the following operations:
form models of virtual cameras;
determine the function of forming virtual images;
evaluate the parameters of the function of forming virtual images;
form virtual images in accordance with the function of forming virtual images;
eliminate existing occlusions on the generated virtual images.

28. The method according to item 27, wherein the virtual cameras are described in terms of matrices of real cameras, and the simulation of virtual cameras is performed according to the formula G = P · V, where G is the full projection matrix for the virtual camera; P is the projection matrix of the virtual camera, the matrix P describing how the three-dimensional point is displayed on the plane of the virtual image is derived from the matrix K of the real camera; V is a virtual camera matrix, and the matrix V, which determines the position and orientation of the virtual camera in three-dimensional space relative to the reference point of the world coordinate system, is determined based on the matrix [R | t] of the real camera.

29. The method according to p, characterized in that the projection matrix P of the virtual camera is defined as

where N and F are the foreground and background of the cone of view of the virtual camera;
L and R are the coordinates of the left and right boundaries of the plane of the virtual image;
T and B are the coordinates of the upper and lower boundaries of the plane of the virtual image.

30. The method according to clause 29, wherein the parameters L, R, B and T of the matrix P are determined in accordance with the parameters of the internal matrix K of the real camera as follows

where W is the image width;
- image height;
(x ₀ , y ₀ ) - coordinates of the center of the image on the image plane;
(f _x , f _y ) is the focal length modeling a photosensitive element of a rectangular shape.

31. The method according to p. 28, characterized in that the matrix V of the virtual camera is determined in accordance with the matrix [R | t] of the real camera as

where the submatrix R and the vector t are taken from the matrix [R | t] of the real camera, which describes the orientation and position of the camera relative to the reference point of the world coordinate system.

32. The method according to item 27, wherein the function of forming virtual images is represented as Y = A + zBx,
where the vector A (A = G · C) is defined as the scalar product of the projection matrix G and the coordinate vector C of the center of the camera;
the matrix B (B = G · M ⁺ ) is defined as the scalar product of the matrix G and the pseudoinverse projection matrix M ^{+ of the} real camera;
x is a uniform vector representing the coordinates of the pixel in the image of the real camera, x = (x, y, 1);
z is the scalar corresponding to the depth value obtained from the disparity / depth map with the x coordinate;
Y is a uniform vector representing the coordinate of the pixel in the virtual image, Y = (x, y, z, 1), where the third element corresponds to the depth of the pixel.

33. The method according to p, characterized in that they evaluate the parameters for the function of forming virtual images in the form of a preliminary calculation of the matrices A and B for each virtual camera.

34. The method according to item 27, wherein the elimination of disclusion on the generated virtual images is carried out in the form of occlusion filtering, while the filter for the pixel I (x, y) of the virtual image in a rectangular window with a size K in the horizontal direction and with size L vertically represent as

where I (x + i, y + j) is the original image pixel;
I ^∈ (x, y) - filtered pixel;
W (x + i, y + j) - pixel weight I (x + i, y + j).

35. The method according to clause 34, wherein the weight of the filter is calculated as

where ΔC _pq is the distance between the points p (x, y) and q (x + i, y + j) in the color space;
ΔJ _pq is the distance between the points p and q in the image area;
f (ΔC _pq , ΔJ _pq ) is the function of calculating the pixel weight of the image.

36. The method according to clause 35, wherein the function of calculating the pixel weight of the image f (ΔC _pq , ΔJ _pq ) is defined as

,
where c ₁ and c ₂ are some predefined values associated with the effect of smoothing and improvement.

37. The method according to clause 35, wherein the distance between points in the color space ΔC _{pq is} calculated as the Euclidean distance.

38. The method according to clause 35, wherein the distance between the pixels in the image area ΔJ _{pq is} calculated as the Euclidean distance.

39. The method according to clause 34, wherein the filter for eliminating disclusion is calculated separately as

where I _row (x, y) represents the result of filtering performed on image lines;
I ^∈ (x, y) the final filtering result performed on the image columns.

40. The method according to clause 34, wherein the filter for eliminating occlusion is applied k times.

41. The method according to clause 34, wherein the size of the filter core in the horizontal direction K and the size of the filter core in the vertical direction L are determined using the function DKS (k), which determines the radius of the filter core in accordance with the iteration number k.

42. The method according to paragraph 41, wherein the function DKS (k) is defined as DKS (k) = a _d · k + b _d , where k is the iteration sequence number; a _d , b _d - linear coefficients.