HK1227131B

HK1227131B - Source code translation

Info

Publication number: HK1227131B
Application number: HK17100730.6A
Authority: HK
Inventors: J．拜特-阿哈隆
Original assignee: 起元科技有限公司
Priority date: 2013-12-06
Filing date: 2014-12-08
Publication date: 2020-08-21

Description

Source code translation

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求享有2013年12月6日提交的序列号为61/912,594的美国申请的优先权。This application claims priority to U.S. application serial number 61/912,594, filed December 6, 2013.

技术领域Technical Field

本说明书涉及源代码翻译，尤其涉及一种或多种原始软件编程语言指定的源代码向一种或多种其它不同的软件编程语言的翻译。This specification relates to source code translation, and more particularly, to the translation of source code specified in one or more original software programming languages into one or more other different software programming languages.

背景技术Background Art

在软件开发领域，软件工程师能够选择用许多不同的编程语言中的一种或多种来开发软件。在撰写本文的时候，开发人员通常使用的现代编程语言的一些例子为Java、C#和C++。一般情况下，每一种编程语言均有它自己的优点和缺点，软件工程师的工作就是在为给定应用程序选择合适的编程语言时考虑这些优点和缺点。In the world of software development, software engineers can choose to develop software using one or more of many different programming languages. At the time of writing, some examples of modern programming languages commonly used by developers are Java, C#, and C++. Generally speaking, each programming language has its own advantages and disadvantages, and it is the software engineer's job to consider these advantages and disadvantages when choosing the right programming language for a given application.

多年以来，编程语言的技术取得了进步，这使得某些早期的编程语言变得较少使用、不再得到支持和/或已经过时。这些早期的编程语言的一些例子为Basic和Fortran。尽管如此，用这些早期的编程语言编写的源代码(常称为“遗留”代码)由于其足够的性能而在产业中保持了许多年。然而，当这种遗留代码运行不再胜任时，对代码进行修改成为必要，很难找到具备更新遗留代码的必要技能的软件工程师。Over the years, programming language technology has advanced, causing some earlier programming languages to become less used, unsupported, and/or obsolete. Examples of these earlier programming languages include Basic and Fortran. Despite this, source code written in these earlier programming languages (often referred to as "legacy" code) has remained in the industry for many years due to its adequate performance. However, when this legacy code no longer performs adequately, code modifications become necessary, and finding software engineers with the necessary skills to update legacy code can be difficult.

出于这个原因，源-到-源编译器(source-to-source compiler)得到了发展，该编译器接收用第一编程语言指定的第一软件规范作为输入并且生成用不同的第二编程语言指定的第二软件规范作为输出。这种源-到-源编译器用于将遗留代码翻译成现代编程语言，而熟练使用现代编程语言的软件工程师更容易编辑现代编程语言。For this reason, source-to-source compilers have been developed that receive as input a first software specification specified in a first programming language and generate as output a second software specification specified in a different second programming language. Such source-to-source compilers are used to translate legacy code into modern programming languages that are more easily edited by software engineers skilled in the use of modern programming languages.

发明内容Summary of the Invention

解决的技术问题涉及在包含过程式语言的源代码的软件规范与包含另一种语言的源代码的软件规范之间转换，该另一种语言不局限于过程式编程结构而是使用不同的形态进行操作。例如，该另一种语言可以在涉及在不同的编程实体之间流动的数据的形态中进行操作以单独驱动执行或与显式控制流结合驱动执行，而不是通过在不同的过程之间显式传递的控制来独自驱动。在具有这种根本差异的语言之间转换源代码涉及得不只是不同风格的语言之间的直译。对于具有多语言源代码的系统，解决的另一个技术问题涉及把源代码提供给将这些多语言合并成为一不同的语言的新系统。The technical problem addressed involves converting between a software specification containing source code in a procedural language and a software specification containing source code in another language that is not limited to procedural programming structures but rather operates using a different modality. For example, the other language can operate in a modality involving data flowing between different programming entities to drive execution alone or in conjunction with explicit control flow, rather than solely through control explicitly passed between different procedures. Converting source code between languages with such fundamental differences involves more than just literal translation between different styles of language. For systems with multilingual source code, another technical problem addressed involves providing the source code to a new system that merges these multiple languages into a single, distinct language.

在一个方案中，一般来说，一种用于软件规范翻译的方法包括：接收用第一编程语言(例如，COBOL)指定的第一软件规范；接收用第二编程语言(例如，COBOL)指定的第二软件规范；接收用第三编程语言指定的第三软件规范(在一些实施例中，第三编程语言(例如，JCL)与第一编程语言和第二编程语言不同)，第三软件规范限定第一软件规范与第二软件规范之间的一个或多个数据关系；用与第一编程语言、第二编程语言和第三编程语言不同的第四编程语言(例如，数据流图)形成第一软件规范的表示；用第四编程语言形成第二软件规范的表示；分析第三软件规范以识别一个或多个数据关系；并且用第四编程语言形成第一软件规范和第二软件规范的组合表示，包括根据所识别的一个或多个数据关系用第四编程语言形成第四编程语言的第一软件规范的表示与第四编程语言的第二软件规范的表示之间的连接。In one aspect, in general, a method for software specification translation includes: receiving a first software specification specified in a first programming language (e.g., COBOL); receiving a second software specification specified in a second programming language (e.g., COBOL); receiving a third software specification specified in a third programming language (in some embodiments, the third programming language (e.g., JCL) is different from the first programming language and the second programming language), the third software specification defining one or more data relationships between the first software specification and the second software specification; forming a representation of the first software specification in a fourth programming language (e.g., a data flow diagram) that is different from the first programming language, the second programming language, and the third programming language; forming a representation of the second software specification in the fourth programming language; analyzing the third software specification to identify one or more data relationships; and forming a combined representation of the first software specification and the second software specification in the fourth programming language, including forming a connection between the representation of the first software specification in the fourth programming language and the representation of the second software specification in the fourth programming language based on the identified one or more data relationships.

多个方案能够包括一个或多个以下特征。Various aspects can include one or more of the following features.

第一编程语言是过程式编程语言。The first programming language was a procedural programming language.

第四编程语言实现软件规范的不同部分之间的并行。The fourth programming language enables parallelism between different parts of a software specification.

第四编程语言使能多种类型的并行，包括：第一类型的并行，使能软件规范的一部分的多个实例针对输入数据流的不同部分进行操作；以及第二类型的并行，使能软件规范的不同部分针对输入数据流的不同部分同时执行。The fourth programming language enables multiple types of parallelism, including: a first type of parallelism that enables multiple instances of a portion of a software specification to operate on different portions of an input data stream; and a second type of parallelism that enables different portions of a software specification to execute simultaneously on different portions of an input data stream.

第二编程语言是过程式编程语言。The second programming language is a procedural programming language.

第二编程语言与第一编程语言相同。The second programming language is the same as the first programming language.

第一软件规范与第二软件规范之间的一个或多个数据关系包括与第一软件规范从第一数据集接收数据和第二软件规范将数据提供给第一数据集对应的至少一个数据关系。The one or more data relationships between the first software specification and the second software specification include at least one data relationship corresponding to the first software specification receiving data from the first data set and the second software specification providing data to the first data set.

第四编程语言是基于数据流图的编程语言。The fourth programming language is a data flow graph based programming language.

第四编程语言的连接与表示数据的流的定向链路对应。A connection in the fourth programming language corresponds to a directional link representing a flow of data.

第一软件规范配置为与一个或多个数据集交互，每一个数据集具有第一软件规范的多个数据集类型中的相关联数据集类型，并且第二软件规范配置为与一个或多个数据集交互，每一个数据集具有第二软件规范的多个数据集类型中的相关联类型，该方法还包括：处理第一软件规范，该处理包括：识别第一软件规范的一个或多个数据集，并且为所识别的一个或多个数据集的每一个确定第一软件规范中的数据集的相关联类型；并且用第四编程语言形成第一软件规范的表示，包括为所识别的一个或多个数据集的每一个用第四编程语言形成数据集的规范，第四编程语言的数据集的规范具有与第一编程语言的数据集的相关联类型对应的类型；其中第四编程语言的一个或多个数据集的规范的至少之一具有：输入数据集类型或输出数据集类型；处理第二软件规范，该处理包括：识别第二软件规范的一个或多个数据集，并且为所识别的一个或多个数据集的每一个确定第二软件规范中的数据集的相关联类型；并且用第四编程语言形成第二软件规范的表示，包括为所识别的一个或多个数据集的每一个用第四编程语言形成数据集的规范，第四编程语言的数据集的规范具有与第一编程语言的数据集的相关联类型对应的类型；其中第四编程语言的一个或多个数据集的规范的至少之一使能输入功能或输出功能。The first software specification is configured to interact with one or more data sets, each data set having an associated data set type from a plurality of data set types of the first software specification, and the second software specification is configured to interact with one or more data sets, each data set having an associated type from a plurality of data set types of the second software specification, the method further comprising: processing the first software specification, the processing comprising: identifying one or more data sets of the first software specification, and determining, for each of the one or more identified data sets, an associated type of the data set in the first software specification; and forming a representation of the first software specification in a fourth programming language, comprising forming a specification of the data set in the fourth programming language for each of the one or more identified data sets, the specification of the data set in the fourth programming language having an associated type from the first programming language the associated type of the dataset in the first programming language; wherein at least one of the specifications of the one or more datasets in the fourth programming language has: an input dataset type or an output dataset type; processing the second software specification, the processing comprising: identifying one or more datasets of the second software specification, and determining the associated type of the dataset in the second software specification for each of the one or more datasets identified; and forming a representation of the second software specification in the fourth programming language, comprising forming a specification of the dataset in the fourth programming language for each of the one or more datasets identified, the specification of the dataset in the fourth programming language having a type corresponding to the associated type of the dataset in the first programming language; wherein at least one of the specifications of the one or more datasets in the fourth programming language enables an input function or an output function.

形成组合表示包括以下至少之一：形成一个或多个连接以利用第四编程语言的第一软件规范的表示与第四编程语言的第二软件规范的表示之间的连接来替代使能输入功能的第四编程语言的第二软件规范的一个或多个数据集的规范与第四编程语言的第二软件规范的表示之间的连接；或者形成一个或多个连接以利用第四编程语言的第二软件规范的表示与第四编程语言的第一软件规范的表示之间的连接来替代使能输入功能的第四编程语言的第一软件规范的一个或多个数据集的规范与第四编程语言的第一软件规范的表示之间的连接。Forming a combined representation includes at least one of the following: forming one or more connections to replace the connection between the specification of one or more data sets of the second software specification of the fourth programming language enabling input functionality and the representation of the second software specification of the fourth programming language with the connection between the representation of the first software specification of the fourth programming language and the representation of the second software specification of the fourth programming language; or forming one or more connections to replace the connection between the specification of one or more data sets of the first software specification of the fourth programming language enabling input functionality and the representation of the first software specification of the fourth programming language with the connection between the representation of the second software specification of the fourth programming language and the representation of the first software specification of the fourth programming language.

该方法还包括：在第四编程语言的第一软件规范的表示中保存使能输出功能的第四编程语言的第一软件规范的一个或多个数据集，或者在第四编程语言的第二软件规范的表示中保存使能输出功能的第四编程语言的第二软件规范的一个或多个数据集。The method also includes: saving one or more data sets of the first software specification in the fourth programming language that enable the output function in the representation of the first software specification in the fourth programming language, or saving one or more data sets of the second software specification in the fourth programming language that enable the output function in the representation of the second software specification in the fourth programming language.

第一软件规范包括一个或多个数据转换操作，并且分析第一软件规范包括识别一个或多个数据转换操作的至少一些并将所识别的数据转换操作划分为第四编程语言的相应的数据转换类型，并且形成第四编程语言的第一软件规范的表示包括，为所识别的数据转换操作的每一个形成第四编程语言的数据转换操作的规范，第四编程语言的数据转换操作的规范使能与第一编程语言的所识别的数据转换操作的数据转换类型对应的数据转换操作。The first software specification includes one or more data conversion operations, and analyzing the first software specification includes identifying at least some of the one or more data conversion operations and classifying the identified data conversion operations into corresponding data conversion types in a fourth programming language, and forming a representation of the first software specification in the fourth programming language includes forming a specification of the data conversion operation in the fourth programming language for each of the identified data conversion operations, the specification of the data conversion operation in the fourth programming language enabling data conversion operations corresponding to the data conversion types of the identified data conversion operations in the first programming language.

第四编程语言的一个或多个数据集的规范的至少之一具有只读随机访问数据集类型。At least one of the specifications of the one or more data sets in the fourth programming language is of a read-only random access data set type.

确定第一软件规范中的数据集的相关联类型包括分析访问该数据集的数据集定义和命令的参数。Determining the associated type of the data set in the first software specification includes analyzing the data set definition and parameters of commands that access the data set.

这些参数包括与数据集相关联的文件组织、与数据集相关联的访问模式、用于打开数据集的模式、以及输入输出操作中的一个或多个。These parameters include one or more of a file organization associated with the data set, an access mode associated with the data set, a mode used to open the data set, and input and output operations.

该方法还包括：在存储介质中存储第一软件规范和第二软件规范的组合表示。The method also includes storing a combined representation of the first software specification and the second software specification in a storage medium.

第一软件规范限定与一个或多个数据集交互的一个或多个数据处理操作，并且第二软件规范限定与一个或多个数据集交互的一个或多个数据处理操作。The first software specification defines one or more data processing operations that interact with the one or more data sets, and the second software specification defines one or more data processing operations that interact with the one or more data sets.

第三软件规范限定第一软件规范的一个或多个数据集与第二软件规范的一个或多个数据集之间的一个或多个数据关系。The third software specification defines one or more data relationships between the one or more data sets of the first software specification and the one or more data sets of the second software specification.

在另一个方案中，一般来说，软件以非临时性形式存储在计算机可读介质上用于软件规范翻译。该软件包括促使计算系统进行以下操作的指令：接收用第一编程语言指定的第一软件规范；接收用第二编程语言指定的第二软件规范；接收用与第一编程语言和第二编程语言不同的第三编程语言指定的第三软件规范，第三软件规范限定第一软件规范与第二软件规范之间的一个或多个数据关系；用与第一编程语言、第二编程语言和第三编程语言不同的第四编程语言形成第一软件规范的表示；用第四编程语言形成第二软件规范的表示；分析第三软件规范以识别一个或多个数据关系；并且用第四编程语言形成第一软件规范和第二软件规范的组合表示，包括根据所识别的一个或多个数据关系用第四编程语言形成第四编程语言的第一软件规范的表示与第四编程语言的第二软件规范的表示之间的连接。In another embodiment, generally, software is stored in a non-transitory form on a computer-readable medium for software specification translation. The software includes instructions that cause a computing system to: receive a first software specification specified in a first programming language; receive a second software specification specified in a second programming language; receive a third software specification specified in a third programming language different from the first and second programming languages, the third software specification defining one or more data relationships between the first and second software specifications; form a representation of the first software specification in a fourth programming language different from the first, second, and third programming languages; form a representation of the second software specification in the fourth programming language; analyze the third software specification to identify one or more data relationships; and form a combined representation of the first and second software specifications in the fourth programming language, including forming a connection in the fourth programming language between the representation of the first software specification in the fourth programming language and the representation of the second software specification in the fourth programming language based on the one or more identified data relationships.

在另一个方案中，通常，一种用于软件规范翻译的计算系统包括：输入设备或端口，配置为接收软件规范，该软件规范包括：用第一编程语言指定的第一软件规范；用第二编程语言指定的第二软件规范；用与第一编程语言和第二编程语言不同的第三编程语言指定的第三软件规范，第三软件规范限定第一软件规范与第二软件规范之间的一个或多个数据关系；以及至少一个处理器，配置为处理所接收的软件规范，该处理包括：用与第一编程语言、第二编程语言和第三编程语言不同的第四编程语言形成第一软件规范的表示；用第四编程语言形成第二软件规范的表示；分析第三软件规范以识别一个或多个数据关系；并且用第四编程语言形成第一软件规范和第二软件规范的组合表示包括根据所识别的一个或多个数据关系用第四编程语言形成第四编程语言的第一软件规范的表示与第四编程语言的第二软件规范的表示之间的连接。In another embodiment, generally, a computing system for software specification translation includes: an input device or port configured to receive a software specification, the software specification including: a first software specification specified in a first programming language; a second software specification specified in a second programming language; a third software specification specified in a third programming language different from the first programming language and the second programming language, the third software specification defining one or more data relationships between the first software specification and the second software specification; and at least one processor configured to process the received software specification, the processing including: forming a representation of the first software specification in a fourth programming language different from the first programming language, the second programming language, and the third programming language; forming a representation of the second software specification in the fourth programming language; analyzing the third software specification to identify one or more data relationships; and forming a combined representation of the first software specification and the second software specification in the fourth programming language including forming a connection between the representation of the first software specification in the fourth programming language and the representation of the second software specification in the fourth programming language based on the identified one or more data relationships.

多个方案可以包括一个或多个以下优点。Various aspects may include one or more of the following advantages.

基于识别不同程序规范之间的某些数据关系来转换程序使得形成能够在各种环境(诸如，数据处理系统)中更有效地执行的组合规范。例如，通过将一种或多种过程式编程语言编写的程序转换为数据流图表示，实现了组件并行、数据并行和流水线并行。对于组件并行，数据流图包括通过表示组件之间的数据的流(或“数据流”)的定向链路互连的多个组件，并且数据流图的不同部分的组件能够针对各个数据流同时运行。对于数据并行，数据流图对分成多个片段(或“分区”)的数据进行处理，并且组件的多个实例能够针对每一个片段同时操作。对于流水线并行，数据流图中的通过数据流链路连接的组件能够同时运行，上游组件将数据添加到该数据流上并且下游组件从该数据流接收数据。Converting a program based on certain data relations between different program specifications makes it possible to form a combined specification that can be more effectively executed in various environments (such as, data processing systems). For example, by converting a program written in one or more procedural programming languages into a data flow graph, component parallelism, data parallelism, and pipeline parallelism are achieved. For component parallelism, a data flow graph includes a plurality of components interconnected by directional links representing the flow (or "data flow") of the data between the components, and the components of the different parts of the data flow graph can be run simultaneously for each data flow. For data parallelism, the data flow graph is processed to the data divided into a plurality of fragments (or "partitions"), and a plurality of instances of a component can be operated simultaneously for each fragment. For pipeline parallelism, the components connected by the data flow links in the data flow graph can be run simultaneously, and the upstream component adds data to the data flow and the downstream component receives data from the data flow.

将过程式编程语言编写的程序(或者程序的至少一些部分的规范)转换为程序的数据流图表示可以实现在不同的服务器上执行数据流图表示的不同组件。Converting a program written in a procedural programming language (or a specification of at least some parts of the program) into a data flow graph representation of the program can enable execution of different components of the data flow graph representation on different servers.

用过程式编程语言编写的程序可能需要的中间数据集(由于其非并行性质)通过转换为程序的数据流图表示并且用数据的流替代中间数据集能够从数据流图消除。在一些例子中，从流经数据流图的数据的路径取出中间数据集并将其保存，以确保使用该数据集的任何其它程序仍然能够访问包括在数据集中的数据。在一些例子中，消除中间数据集能够减少存储和I-O流量要求。Programs written in procedural programming languages that may require intermediate data sets (due to their non-parallel nature) can be eliminated from the data flow graph by converting the program into a data flow graph representation and replacing the intermediate data sets with flows of data. In some examples, the intermediate data sets are taken from the path of the data flowing through the data flow graph and saved to ensure that any other program that uses the data set can still access the data included in the data set. In some examples, eliminating the intermediate data sets can reduce storage and I-O traffic requirements.

将用一种或多种过程式编程语言编写的程序转换为数据流图表示实现了通过程序的数据沿袭的可视化。Converting a program written in one or more procedural programming languages into a data flow graph representation enables visualization of the lineage of data through the program.

数据流编程语言对于数据库类型是不可知的。因此，将用过程式编程语言编写的程序转换为程序的数据流图表示可以实现使用具有最初得不到用过程式编程语言编写的程序支持的数据库类型的程序。即，多种方法可以将代码(例如，JCL/COBOL代码)中的输入和输出抽象化为可连接至许多不同类型的源和汇点(例如，队列、数据库表、文件等)的流。Dataflow programming languages are database-agnostic. Therefore, converting a program written in a procedural programming language into a dataflow graph representation of the program can enable the use of the program with database types not originally supported by the procedural programming language. That is, various methods can abstract the inputs and outputs in code (e.g., JCL/COBOL code) into flows that can be connected to many different types of sources and sinks (e.g., queues, database tables, files, etc.).

将用过程式编程语言编写的程序转换为程序的数据流图表示可以实现用户定义的复用数据类型的使用。与在数据类型(即，元数据)和存储分配之间没有明确区分而是将两者结合在数据部(data division)中的一些过程式编程语言(诸如COBOL)相比，这是有利的。本文描述的方法从COBOL源代码提取元数据并创建复用数据类型(例如，DML数据类型)和类型定义文件。复用数据类型和类型定义文件能够用于处于过程转换首位的存储分配以及用于端口和查找文件记录定义。在一些例子中，提取的数据类型(例如，来自COBOL的数据类型元数据)与提取的数据集(例如，来自JCL的数据集元数据)结合也能够用于将访问相同数据集的多个程序的数据的部分描述(即，部分元数据)整合成为数据的综合描述。The data flow diagram that the program written in procedural programming language is converted into program can realize the use of user-defined multiplexed data types. Compared with some procedural programming languages (such as COBOL) that do not clearly distinguish between data type (that is, metadata) and storage allocation but combine the two in data division (data division), this is advantageous. The method described herein extracts metadata from COBOL source code and creates multiplexed data types (for example, DML data types) and type definition files. Multiplexed data types and type definition files can be used for storage allocation in the first place of process conversion and for port and lookup file record definition. In some examples, the extracted data type (for example, data type metadata from COBOL) is combined with the extracted data set (for example, data set metadata from JCL) and can also be used to integrate the partial description (that is, partial metadata) of the data of multiple programs accessing the same data set into a comprehensive description of data.

将用一种或多种过程式编程语言编写的程序转换为数据流图表示实现了通过基于数据流图的图形化开发环境来简化程序的编辑。Converting a program written in one or more procedural programming languages into a data flow graph representation enables simplified editing of the program through a graphical development environment based on data flow graphs.

本发明的其它特征和优点将通过以下说明和权利要求变得明显。Other features and advantages of the invention will be apparent from the following description and from the claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是包括软件翻译模块的系统的框图。FIG1 is a block diagram of a system including a software translation module.

图2是软件规范的示意性例子。FIG2 is a schematic example of a software specification.

图3是顶级软件翻译模块的框图。Figure 3 is a block diagram of the top-level software translation module.

图4是软件翻译模块的框图。FIG4 is a block diagram of a software translation module.

图5是数据集功能及其相关联的数据集组织类型、访问模式和打开模式的组合的表格。FIG5 is a table of dataset functions and their associated combinations of dataset organization type, access mode, and open mode.

图6是第一过程转换。FIG6 is a first process transition.

图7是第二过程转换。FIG7 is a second process transition.

图8是第三过程转换。FIG8 is a third process transition.

图9是程序的数据流图表示。Figure 9 is a data flow graph representation of the program.

图10示出复合数据流图的创建。Figure 10 illustrates the creation of a composite dataflow graph.

图11是图3的顶级软件翻译模块的操作性例子。FIG. 11 is an operational example of the top-level software translation module of FIG. 3 .

图12是图4的软件翻译模块的操作性例子。FIG. 12 is an operational example of the software translation module of FIG. 4 .

图13是复合数据流图。FIG13 is a composite data flow diagram.

具体实施方式DETAILED DESCRIPTION

图1示出能够使用本文描述的源代码翻译技术来翻译程序的数据处理系统100的例子。经翻译的程序能够执行用于处理来自数据处理系统100的数据源102的数据。翻译模块120接收一种或多种过程式编程语言的第一软件规范122作为输入，并处理软件规范122以用基于数据流的编程语言生成第一软件规范122的复合数据流图表示332。第一软件规范122的数据流图表示332存储在数据存储系统116中，其通过数据存储系统116能够视觉呈现在开发环境118内。开发人员120能够使用开发环境118验证和/或修改第一软件规范122的数据流图表示332。FIG1 illustrates an example of a data processing system 100 capable of translating a program using the source code translation techniques described herein. The translated program is executable to process data from a data source 102 of the data processing system 100. A translation module 120 receives as input a first software specification 122 in one or more procedural programming languages and processes the software specification 122 to generate a composite dataflow graph representation 332 of the first software specification 122 in a dataflow-based programming language. The dataflow graph representation 332 of the first software specification 122 is stored in the data storage system 116, where it can be visually presented within the development environment 118. Developers 120 can use the development environment 118 to verify and/or modify the dataflow graph representation 332 of the first software specification 122.

系统100包括数据源102，数据源102可以包括一个或多个数据源，诸如，存储设备或与在线数据流的连接，每一个可以各种格式(例如，数据库表、电子表格文件、纯文本文件、或主机使用的本机格式)中的任何格式存储或提供数据。执行环境104包括加载模块106和执行模块112。例如可以在合适的操作系统(诸如，UNIX版本的操作系统)的控制下在一个或多个通用计算机上对执行环境104进行托管。例如，执行环境104能够包括多节点并行计算环境，该多节点并行计算环境包括使用多个中央处理单元(CPU)或处理器内核的计算机系统配置，其是本地的(例如，多处理器系统，诸如对称多处理(SMP)计算机)或本地分布的(例如，耦接成簇的多处理器或大规模并行处理(MPP)系统)，或者是远程的或远程分布的(例如，经由局域网(LAN)和/或广域网(WAN)耦接的多处理器)或其任意组合。System 100 includes data source 102, which may include one or more data sources, such as storage devices or connections to online data streams, each of which can store or provide data in any of a variety of formats (e.g., database tables, spreadsheet files, plain text files, or native formats used by the host). Execution environment 104 includes load modules 106 and execution modules 112. Execution environment 104 can be hosted on one or more general-purpose computers under the control of a suitable operating system (e.g., a UNIX version of the operating system). For example, execution environment 104 can include a multi-node parallel computing environment, including a computer system configuration using multiple central processing units (CPUs) or processor cores, which can be local (e.g., a multiprocessor system such as a symmetric multiprocessing (SMP) computer) or locally distributed (e.g., a clustered multiprocessor or massively parallel processing (MPP) system), remote or remotely distributed (e.g., a multiprocessor coupled via a local area network (LAN) and/or a wide area network (WAN)), or any combination thereof.

加载模块106将数据流图表示332加载到执行模块112中，数据流图表示332从执行模块112处被执行以用于处理来自数据源102的数据。提供数据源102的存储设备对于执行环境104而言可以是本地的，例如存储在与托管执行环境104的计算机连接的存储介质(例如，硬盘驱动器108)上，或者对于执行环境104而言可以是远程的，例如，托管在通过(例如，云计算基础架构提供的)远程连接与托管执行环境104的计算机通信的远程系统(例如，主机110)上。执行模块104执行的数据流图表示332能够从可以具体实施数据源102的包括不同形式的数据库系统的多种类型的系统接收数据。该数据可以组织为具有各个字段(也称为“属性”或“列”)的值的记录(也称为“行”)，包括可能的空值。当首先从数据源读取数据时，数据流图表示332通常以有关该数据源中的记录的一些初始格式信息开始。在一些情况下，可能最初不知道数据源的记录结构，而是在分析数据源或数据之后确定数据源的记录结构。有关记录的初始信息例如能够包括表示不同值的比特的数、记录内的字段的顺序以及比特所表示的值的类型(例如，字符串、带符号整数/无符号整数)。Load module 106 loads data flow graph representation 332 into execution module 112, where it is executed to process data from data source 102. The storage device providing data source 102 can be local to execution environment 104, such as stored on a storage medium (e.g., hard drive 108) connected to the computer hosting execution environment 104, or can be remote to execution environment 104, such as hosted on a remote system (e.g., host computer 110) that communicates with the computer hosting execution environment 104 via a remote connection (e.g., provided by a cloud computing infrastructure). Data flow graph representation 332 executed by execution module 104 can receive data from a variety of systems, including different forms of database systems, that can embody data source 102. This data can be organized into records (also called "rows") with values for various fields (also called "attributes" or "columns"), including possible null values. When data is first read from a data source, data flow graph representation 332 typically begins with some initial formatting information about the records in that data source. In some cases, the record structure of the data source may not be known initially, but may be determined after analyzing the data source or the data. The initial information about the record can include, for example, the number of bits representing different values, the order of the fields within the record, and the type of value represented by the bits (e.g., string, signed integer/unsigned integer).

数据流图表示332可以配置为生成输出数据，该输出数据可以存储回数据源102或者执行环境104可访问的数据存储系统116中，或者可以以其它方式使用。开发环境118也可访问数据存储系统116。在一些实现方式中，开发环境118是用于开发应用程序作为数据流图的系统，该数据流图包括多个顶点(表示数据处理组件或数据集)，该多个顶点通过顶点之间的定向链路(也称为“流”，表示工作元素(即数据)的流)连接。例如，在通过引用合并于此的名称为“管理基于图形的应用程序的参数(Managing Parameters for Graph-BasedApplications)”的第2007/0011668号美国公开文本中更详细地描述了这种环境。在通过引用合并于此的名称为“执行表示为图形的计算(EXECUTING COMPUTATIONS EXPRESSED ASGRAPHS)”的第5,966,072号美国专利中描述了一种用于执行这种基于图形的计算的系统。根据本系统作出的数据流图提供用于将信息放入或取出图组件表示的各个进程、在进程之间移动信息以及为进程限定运行顺序的方法。本系统包括从任何可用方法选择进程间通信方法的算法(例如，根据图的链路的通信路径能够使用TCP/IP或UNIX域套接字，或者使用共享存储器在进程之间传递数据)。The data flow graph representation 332 can be configured to generate output data that can be stored back in a data storage system 116 accessible to the data source 102 or the execution environment 104, or used in other ways. The development environment 118 can also access the data storage system 116. In some implementations, the development environment 118 is a system for developing applications as data flow graphs, which include a plurality of vertices (representing data processing components or data sets) connected by directed links (also referred to as "flows") between the vertices, representing the flow of work elements (i.e., data). For example, such an environment is described in more detail in U.S. Publication No. 2007/0011668, entitled "Managing Parameters for Graph-Based Applications," which is incorporated herein by reference. A system for performing such graph-based computations is described in U.S. Patent No. 5,966,072, entitled "Executing Computations Expressed as Graphs," which is incorporated herein by reference. Dataflow graphs created according to the present system provide methods for getting information into and out of the various processes represented by the graph components, for moving information between processes, and for defining the order in which processes should run. The present system includes algorithms for selecting an inter-process communication method from any available methods (e.g., the communication paths according to the graph's links can use TCP/IP or UNIX domain sockets, or use shared memory to pass data between processes).

1软件规范1. Software Specifications

在一些例子中，使用一种或多种基于过程式文本的编程语言指定第一软件规范122，诸如C、C++、Java、C#、IBM的作业控制语言(JCL)、COBOL、Fortran、Assembly(汇编)等。对于下文一些例子，软件规范122包括使用JCL脚本语言编写的批处理脚本和使用COBOL编程语言编写的许多程序。JCL脚本引用COBOL程序并且实行基于决策的受控执行流。应理解，第一软件规范122不限于JCL和COBOL编程语言的组合，并且理解编程语言的这种组合仅用于示出翻译模块120的一个示例性实施例。In some examples, the first software specification 122 is specified using one or more procedural text-based programming languages, such as C, C++, Java, C#, IBM's Job Control Language (JCL), COBOL, Fortran, Assembly, and the like. For some examples below, the software specification 122 includes a batch script written in the JCL scripting language and a number of programs written in the COBOL programming language. The JCL scripts reference the COBOL programs and implement a controlled execution flow based on decisions. It should be understood that the first software specification 122 is not limited to a combination of the JCL and COBOL programming languages, and it should be understood that this combination of programming languages is merely used to illustrate one exemplary embodiment of the translation module 120.

参照图2，图1的软件规范122的一个例子的示意图包括包括许多步骤230的JCL脚本226，其中一些步骤执行COBOL程序228。为了简化本说明书，省略了JCL脚本226的其它可能的步骤。JCL脚本中执行COBOL程序的每一个步骤指定COBOL程序的名称(例如，COBOL1)以及操作COBOL程序的数据集。例如，JCL脚本的步骤3在“DS1.data(DS1.数据)”和“DS2.data”数据集上执行称为“COBOL1”的COBOL程序。在JCL脚本226中，与给定COBOL程序相关联的每一个数据集分配有文件句柄(也称为“DD名称”)。例如，在图2中，“DS1.data”分配有文件句柄“A”，并且“DS2.data”分配有文件句柄“B”。COBOL程序228的每一个包括在JCL脚本226指定的数据集上操作的源代码(用COBOL程序语言编写的)。给定数据集的文件句柄(即，DD名称)是JCL脚本226和COBOL程序的代码用来识别数据集的识别符。2 , a schematic diagram of an example of the software specification 122 of FIG. 1 includes a JCL script 226 including a number of steps 230, some of which execute a COBOL program 228. To simplify this description, other possible steps of the JCL script 226 are omitted. Each step in the JCL script that executes a COBOL program specifies the name of the COBOL program (e.g., COBOL1) and the data sets on which the COBOL program operates. For example, step 3 of the JCL script executes a COBOL program called "COBOL1" on the "DS1.data" and "DS2.data" data sets. In the JCL script 226, each data set associated with a given COBOL program is assigned a file handle (also referred to as a "DD name"). For example, in FIG. 2 , "DS1.data" is assigned a file handle "A," and "DS2.data" is assigned a file handle "B." Each of the COBOL programs 228 includes source code (written in the COBOL programming language) that operates on the data sets specified by the JCL script 226. The file handle (ie, DD name) of a given data set is an identifier that the JCL script 226 and the code of the COBOL program use to identify the data set.

在操作中，例如在IBM大型计算机上运行的传统作业调度器访问JCL脚本226，并且根据JCL脚本226定义的控制流依次地(即，一次一个)执行脚本的步骤230。一般来说，任何COBOL程序通过从存储数据集的存储介质(例如，数据源102或数据存储系统116的存储介质，诸如硬盘驱动器，简称“磁盘”)读取或写入存储数据集的存储介质来访问输入或输出数据集。通常情况下，在将控制传递回JCL脚本226之前，由JCL脚本226执行的每一个COBOL程序从磁盘读取其所有输入数据，并且将其所有输出数据写入磁盘。因而，依赖于先前步骤的输出来输入数据的任何步骤必须从磁盘读出输入数据。In operation, a conventional job scheduler, such as one running on an IBM mainframe computer, accesses the JCL script 226 and executes the script's steps 230 sequentially (i.e., one at a time) according to the control flow defined by the JCL script 226. Generally speaking, any COBOL program accesses input or output data sets by reading from or writing to a storage medium storing the data sets (e.g., a storage medium of the data source 102 or the data storage system 116, such as a hard disk drive ("disk")). Typically, each COBOL program executed by the JCL script 226 reads all of its input data from disk and writes all of its output data to disk before passing control back to the JCL script 226. Thus, any step that relies on the output of a previous step for input data must read the input data from disk.

2翻译模块2 Translation Module

参照图3，图1的翻译模块120的一个例子接收包括JCL脚本226和JCL脚本226引用的COBOL程序228的软件规范122作为输入，并且处理软件规范122，以生成实现与第一软件规范122相同的功能且由图1的执行环境104使用的复合数据流图332。翻译模块120包括COBOL到数据流图翻译器334和复合图合成器336。3 , one example of the translation module 120 of FIG 1 receives as input a software specification 122 including a JCL script 226 and a COBOL program 228 referenced by the JCL script 226, and processes the software specification 122 to generate a composite data flow graph 332 that implements the same functionality as the first software specification 122 and is used by the execution environment 104 of FIG 1 . The translation module 120 includes a COBOL to data flow graph translator 334 and a composite graph synthesizer 336.

一般而言，COBOL到数据流图翻译器334接收COBOL程序228作为输入，并且将每一个COBOL程序转换为COBOL程序的单独的数据流图表示338。如下文更详细地描述的，COBOL程序的每一个数据流图表示338包括称为“过程转换”组件的数据流图组件和零个或更多个数据集和/或其它数据流图组件。过程转换组件包括诸如输入端口和输出端口等端口，用于将过程转换组件连接至COBOL程序的数据流图表示338的数据集和其它组件，并且执行COBOL程序的功能的一些或全部。在一些例子中，COBOL程序的数据流图表示包括类似于COBOL程序中存在的命令的数据流图组件。在一些例子中，COBOL程序的数据流图表示338能够实现为“子图”，其具有用于在COBOL程序的数据流图表示338实例与其它数据流图组件(例如，图3的复合数据流图332的其它数据流图组件)之间形成流的输入端口和输出端口。In general, the COBOL to data flow graph translator 334 receives COBOL programs 228 as input and converts each COBOL program into a separate data flow graph representation 338 of the COBOL program. As described in more detail below, each data flow graph representation 338 of a COBOL program includes a data flow graph component, referred to as a "process transformation" component, and zero or more data sets and/or other data flow graph components. The process transformation component includes ports, such as input ports and output ports, for connecting the process transformation component to the data sets and other components of the data flow graph representation 338 of the COBOL program and performing some or all of the functions of the COBOL program. In some examples, the data flow graph representation of a COBOL program includes data flow graph components that are similar to commands present in the COBOL program. In some examples, the data flow graph representation 338 of a COBOL program can be implemented as a "subgraph" having input ports and output ports for forming flows between instances of the data flow graph representation 338 of the COBOL program and other data flow graph components (e.g., other data flow graph components of the composite data flow graph 332 of FIG. 3 ).

JCL脚本226和COBOL程序的数据流图表示338提供给复合图合成器336，其分析JCL脚本226以确定COBOL程序与任何其它组件之间的数据流互连。然后，复合图合成器336根据所确定的数据流互连使用流通过将COBOL程序的数据流图表示338的输入/输出端口联接来合成复合数据流图332。复合图合成器336通过识别在JCL的较早的步骤写入并且在JCL的稍后的步骤读取的“中间”数据集来确定COBOL程序之间的数据流互连。在一些例子中，中间数据集能够由复合数据流图332中的组件之间的数据流消除并替代。由于流水线并行性(pipeline pallelism)，通过允许数据在组件之间直接流动而无需执行写入磁盘并从磁盘读取的中间步骤能够实现显著的性能改进。应注意，上文使用的术语“消除”不一定表示删除了中间数据集。在一些例子中，从流经数据流图的数据的路径取出的中间数据集仍写入磁盘以确保依赖于中间数据集的其它程序(例如，通过其它JCL脚本执行的那些程序)仍然能够访问其数据。若能够完全消除中间文件(因为JCL在使用完它们之后删除它们)，数据流图表示也将降低存储容量要求。The JCL script 226 and the data flow graph representation 338 of the COBOL program are provided to a composite graph synthesizer 336, which analyzes the JCL script 226 to determine the data flow interconnections between the COBOL program and any other components. The composite graph synthesizer 336 then synthesizes a composite data flow graph 332 by connecting the input/output ports of the data flow graph representation 338 of the COBOL program using flows based on the determined data flow interconnections. The composite graph synthesizer 336 determines the data flow interconnections between the COBOL programs by identifying "intermediate" data sets that are written at earlier steps in the JCL and read at later steps in the JCL. In some examples, the intermediate data sets can be eliminated and replaced by data flows between components in the composite data flow graph 332. Due to pipeline parallelism, significant performance improvements can be achieved by allowing data to flow directly between components without performing the intermediate steps of writing to and reading from disk. It should be noted that the term "elimination" as used above does not necessarily mean that intermediate data sets are deleted. In some cases, intermediate data sets taken from the path of data flowing through the data flow graph are still written to disk to ensure that other programs that depend on the intermediate data sets (for example, those executed by other JCL scripts) can still access their data. If the intermediate files can be eliminated completely (because the JCL deletes them after using them), the data flow graph representation will also reduce storage capacity requirements.

在一些例子中，能够忽略JCL代码中的某些步骤的顺序性，在复合数据流图332中得到组件并行。在其它例子中，对于将一个步骤的输出提供作为另一个步骤的输入的步骤，通过使用流连接步骤的各个组件来保存步骤的顺序性，实现了流水线并行性。In some examples, the sequential nature of certain steps in the JCL code can be ignored, resulting in component parallelism in the composite dataflow graph 332. In other examples, pipeline parallelism is achieved by preserving the sequential nature of steps by connecting the components of the steps using flows, for steps that provide the output of one step as the input of another step.

2.1COBOL到数据流图翻译器2.1COBOL to Data Flow Graph Translator

参照图4，COBOL到数据流图翻译器334的实现的详细框图接收数个COBOL程序228作为输入，并且处理COBOL程序228，以生成COBOL程序的多个数据流图表示338。COBOL到数据流图翻译器334包括COBOL解析器440、内部组件分析器444、数据集功能分析器442、元数据分析器441、SQL分析器443、过程部翻译器445和子图合成器446。4 , a detailed block diagram of an implementation of a COBOL to data flow graph translator 334 receives a plurality of COBOL programs 228 as input and processes the COBOL programs 228 to generate a plurality of data flow graph representations 338 of the COBOL programs. The COBOL to data flow graph translator 334 includes a COBOL parser 440, an internal component analyzer 444, a data set function analyzer 442, a metadata analyzer 441, an SQL analyzer 443, a procedure portion translator 445, and a subgraph synthesizer 446.

每一个COBOL程序228首先提供给解析COBOL程序228以生成解析树的COBOL解析器440。COBOL解析器440所生成的解析树然后传递给内部组件分析器444、数据集功能分析器442、元数据分析器441和SQL分析器443。Each COBOL program 228 is first provided to a COBOL parser 440 that parses the COBOL program 228 to generate a parse tree. The parse tree generated by the COBOL parser 440 is then passed to an internal component analyzer 444, a data set function analyzer 442, a metadata analyzer 441, and an SQL analyzer 443.

内部组件分析器444分析解析树以识别数据流图编程语言中具有类似数据流图组件的程序进程(例如，内部排序)。能够转换为数据流图组件的COBOL操作的一些例子是“内部排序(internal sort)”和“内部再循环(internal recirculate)”(临时存储)操作。内部排序操作与具有输入端口和输出端口的组件对应，输入端口接收未排序数据的流，输出端口提供排序数据的流，输入端口和输出端口链接到主组件，如下文更详细描述的。内部再循环操作与按顺序首先整体编写然后在COBOL程序内整体读取的中间文件对应。数据集功能分析器444的输出是包括所识别的操作的列表连同它们在COBOL解析树中的相应位置的内部组件结果448。The internal component analyzer 444 analyzes the parse tree to identify program processes in the data flow graph programming language that have similar data flow graph components (e.g., internal sort). Some examples of COBOL operations that can be converted into data flow graph components are "internal sort" and "internal recirculate" (temporary storage) operations. The internal sort operation corresponds to a component having an input port and an output port, the input port receiving a stream of unsorted data and the output port providing a stream of sorted data, the input port and the output port being linked to the main component, as described in more detail below. The internal recirculate operation corresponds to an intermediate file that is first written in its entirety in sequence and then read in its entirety within the COBOL program. The output of the data set function analyzer 444 is an internal component result 448 that includes a list of the identified operations along with their corresponding positions in the COBOL parse tree.

以上适用于能够识别声明或者声明和/或操作的顺序的任何过程式语言，其中声明和/或操作对流中的一系列记录进行特定转换，流中的一系列记录与在输入端口处接收流并从输出端口提供经转换的记录的组件或子图对应。The above applies to any procedural language that can identify declarations or a sequence of declarations and/or operations that perform a particular transformation on a series of records in a stream, corresponding to a component or subgraph that receives the stream at an input port and provides the transformed records from an output port.

数据集功能分析器442分析解析树以识别COBOL程序228访问(例如，打开、创建、写入或从其读取)的所有数据源和汇点(例如，数据集)并且确定与COBOL程序的数据集相关联的类型。为此，数据集功能分析器442识别并分析访问(例如，打开、读取、写入、删除等)数据集的COBOL声明。在一些例子中，可与数据集相关联的可能类型包括：输入、输出、查找和可更新查找。COBOL限定指定数据集的句柄或路径、数据集的文件组织以及数据集的访问模式，具有附加信息，诸如，通过输入-输出声明(Input-Output statement)确定的文件打开模式(多个)。The data set function analyzer 442 analyzes the parse tree to identify all data sources and sinks (e.g., data sets) that the COBOL program 228 accesses (e.g., opens, creates, writes to, or reads from) and determines the types associated with the data sets of the COBOL program. To do this, the data set function analyzer 442 identifies and analyzes COBOL statements that access (e.g., open, read, write to, delete, etc.) data sets. In some examples, possible types that may be associated with a data set include: input, output, lookup, and updateable lookup. COBOL defines a handle or path to a specified data set, the file organization of the data set, and the access mode of the data set, with additional information such as the file open mode(s) determined by the Input-Output statement.

可能的数据集文件组织包括：顺序组织(SEQUENTIAL)、索引组织(INDEXTED)和相关组织(RELATIVE)。具有顺序组织的数据集包括只能按顺序(即，按照它们最初写入数据集的顺序)访问的记录。具有索引组织的数据集包括每一个与一个或多个索引键相关联的记录。索引数据集的记录能够使用键随机访问、或者从文件中的任何给定位置按顺序访问。具有相关组织的数据集具有用正整数进行编号的记录槽，每一个槽标记为空或者包含记录。当按顺序读取具有相关组织的文件时，跳过空槽。能够使用槽号作为键来直接访问相关文件的记录。‘文件位置’的概念是这三个文件组织常见的。Possible dataset file organizations include: sequential, indexed, and relative. A dataset with sequential organization includes records that can only be accessed sequentially (i.e., in the order in which they were originally written to the dataset). A dataset with indexed organization includes each record associated with one or more index keys. The records of an indexed dataset can be accessed randomly using the key, or sequentially from any given position in the file. A dataset with relative organization has record slots numbered with positive integers, with each slot marked as either empty or containing a record. When a file with relative organization is read sequentially, empty slots are skipped. Records of the relative file can be directly accessed using the slot number as a key. The concept of 'file position' is common to all three file organizations.

可能的访问模式包括：顺序访问模式(SEQUENTIAL)、随机访问模式(RANDOM)和动态访问模式(DYNAMIC)。顺序访问模式表示按照录入、升序或降序键顺序来依次访问数据集中的记录。随机访问模式表示使用记录识别键访问数据集中的记录。动态访问模式表示数据集中的记录能够使用记录识别键直接访问、或者从选择的任何文件位置按顺序访问。Possible access modes include: sequential access mode (SEQUENTIAL), random access mode (RANDOM), and dynamic access mode (DYNAMIC). Sequential access mode means that records in the data set are accessed sequentially according to the order in which they are entered, in ascending or descending key order. Random access mode means that records in the data set are accessed using record identification keys. Dynamic access mode means that records in the data set can be accessed directly using record identification keys, or sequentially from any selected file location.

可能的打开模式包括：输入打开模式(INPUT)、输出打开模式(OUTPUT)、扩展打开模式(EXTEND)和I-O打开模式(I-O)。输入打开模式表示数据集作为输入数据集被打开。输出打开模式表示空的数据集作为输出数据集被打开。扩展打开模式表示包括预先存在的记录的数据集作为附加有新记录的输出数据集被打开。I-O打开模式表示该数据集打开模式支持输入和输出数据集操作(而不管这些操作是否存在于程序中)。Possible open modes include: Input Open Mode (INPUT), Output Open Mode (OUTPUT), Extended Open Mode (EXTEND), and I-O Open Mode (I-O). Input Open Mode indicates that a dataset is opened as an input dataset. Output Open Mode indicates that an empty dataset is opened as an output dataset. Extended Open Mode indicates that a dataset containing pre-existing records is opened as an output dataset with new records appended. I-O Open Mode indicates that this dataset open mode supports input and output dataset operations (regardless of whether these operations exist in the program).

数据集功能分析器442将以下组规则应用于COBOL数据集访问命令的文件组织、访问模式和打开模式，以确定与COBOL程序的数据集相关联的功能：The data set functionality analyzer 442 applies the following set of rules to the file organization, access mode, and open mode of a COBOL data set access command to determine the functionality associated with a data set of a COBOL program:

·输出(OUTPUT)数据集是具有顺序组织、索引组织或相关组织，顺序访问模式、随机访问模式或动态访问模式，以及输出打开模式或扩展打开模式的数据集。An output (OUTPUT) data set is a data set with sequential organization, indexed organization, or associative organization, sequential access mode, random access mode, or dynamic access mode, and output open mode or extended open mode.

·输入(INPUT)数据集是具有索引组织、相关组织或顺序组织，顺序访问模式，以及输入打开模式的数据集。An input data set is a data set with index organization, associative organization, or sequential organization, sequential access mode, and input open mode.

·查找(LOOKUP)数据集是具有索引组织或相关组织，随机访问模式或动态访问模式，以及输入打开模式的数据集。A lookup data set is a data set with index organization or associative organization, random access mode or dynamic access mode, and input open mode.

·可更新查找(UPDATABLE LOOKUP)数据集是具有索引组织或相关组织，随机访问模式或动态访问模式，以及I-O打开模式的数据集。An updateable lookup data set is a data set with index organization or correlation organization, random access mode or dynamic access mode, and I-O open mode.

在一些例子中，文件的“有效打开模式”可以通过计数文件的实际输入和输出操作来确定。例如，如果文件在I-O模式下打开，但仅具有写操作而无读取或开始操作，则“有效打开模式”能够变成扩展打开模式。In some examples, the "effective open mode" of a file can be determined by counting the actual input and output operations of the file. For example, if a file is opened in I-O mode, but has only write operations and no read or start operations, the "effective open mode" can be changed to extended open mode.

参照图5，表501列出了数据集组织、访问模式、打开模式的不同组合连同与每一个组合相关联的数据集功能。5 , table 501 lists different combinations of dataset organization, access mode, and open mode along with the dataset functions associated with each combination.

再次参照图4，数据集功能分析器442的输出是包括COBOL程序所访问的所有数据集的列表连同它们在COBOL程序中的相关联功能的数据集功能结果450。Referring again to FIG. 4 , the output of the data set function analyzer 442 is a data set function result 450 that includes a listing of all data sets accessed by the COBOL program along with their associated functions in the COBOL program.

元数据分析器441分析解析树以提取元数据，并且创建可复用数据类型(例如，DML数据类型)和类型定义文件。可复用数据类型与COBOL程序中的存储分配不同。元数据分析器441的输出是数据类型结果447。Metadata analyzer 441 analyzes the parse tree to extract metadata and creates reusable data types (e.g., DML data types) and type definition files. Reusable data types are different from storage allocations in COBOL programs. The output of metadata analyzer 441 is data type results 447.

SQL分析器443分析解析树以识别COBOL程序中的嵌入式结构化查询语言(SQL)代码(或简称“嵌入式SQL”)。任何识别的嵌入式SQL被处理成为数据库接口信息449。用于访问数据库的数据库应用编程接口(API)可以提供能够在数据库接口信息449内使用的原语(primitive)。在一些例子中，包含这些原语避免了使用特定方案访问特定数据库来编译嵌入式SQL的部分(例如，编译成为使用二进制操作来操作的二进制形式)的需求。相反，能够使用数据库接口信息449内布置的适当的API原语，潜在地根据需要使用不同的数据库和/或方案在运行时间解译嵌入式SQL的灵活性，能够权衡这种编译提供的一些效率。The SQL analyzer 443 analyzes the parse tree to identify embedded structured query language (SQL) code (or simply "embedded SQL") in the COBOL program. Any identified embedded SQL is processed into database interface information 449. The database application programming interface (API) for accessing the database can provide primitives that can be used within the database interface information 449. In some examples, the inclusion of these primitives avoids the need to compile portions of the embedded SQL using a specific solution to access a specific database (e.g., compiling into a binary form that is manipulated using binary operations). Instead, the flexibility of being able to interpret the embedded SQL at runtime using appropriate API primitives arranged within the database interface information 449, potentially using different databases and/or solutions as needed, can trade off some of the efficiencies provided by such compilation.

然后，COBOL程序的解析树连同内部组件结果448、数据集功能结果450、数据类型结果447和数据库接口信息结果449一起提供给过程部翻译器445。过程部翻译器445分析解析树以将COBOL逻辑翻译成“过程转换”数据流图组件452。一般情况下，过程转换数据流图组件452是容器类型组件，其包含与COBOL程序相关联的COBOL逻辑的一些或全部，并具有分别从该组件接收输入数据并提供输出数据的输入端口和输出端口。在COBOL代码包括来自不同的编程语言的代码的情况下(例如，SQL代码由SQL分析器443识别并在数据库接口信息结果449中提供)，过程部翻译器445使用数据库接口信息结果449在过程转换数据流图组件452内生成该嵌入代码的适当表示。在一些例子中，过程部翻译器445使用数据库API生成嵌入代码的适当表示。在其它例子中，用输入表组件替代嵌入式SQL表和游标，从而用调用读取_记录(端口_号)替代提取(FETCH)操作，如同为文件所做的那样。The parse tree of the COBOL program is then provided to a process translator 445, along with internal component results 448, data set function results 450, data type results 447, and database interface information results 449. The process translator 445 analyzes the parse tree to translate the COBOL logic into a "process transformation" data flow graph component 452. Typically, the process transformation data flow graph component 452 is a container-type component that contains some or all of the COBOL logic associated with the COBOL program and has input ports and output ports that receive input data from the component and provide output data, respectively. In the event that the COBOL code includes code from a different programming language (e.g., SQL code identified by the SQL analyzer 443 and provided in the database interface information result 449), the process translator 445 uses the database interface information result 449 to generate an appropriate representation of the embedded code within the process transformation data flow graph component 452. In some examples, the process translator 445 uses the database API to generate an appropriate representation of the embedded code. In other examples, embedded SQL tables and cursors are replaced with input table components, replacing FETCH operations with calls to read_record(port_number), as is done for files.

在一些例子中，过程部翻译器445只生成包括表示COBOL程序的过程式逻辑的数据操纵语言(Data Manipulation Language，DML)代码的文件。子图合成器446生成使用过程部翻译器445所生成的文件的过程转换数据流组件。In some examples, the procedural translator 445 generates only a file containing Data Manipulation Language (DML) code representing the procedural logic of the COBOL program. The subgraph synthesizer 446 generates a process transformation data flow component using the file generated by the procedural translator 445.

要注意，图4和上文描述涉及内部组件分析器444、数据集功能分析器442、元数据分析器441和SQL分析器443的操作的一个可能的顺序。然而，这些分析器的操作的顺序不限于上文描述的顺序，这些分析器的操作的其它顺序是可能的。4 and the above description relate to one possible order of operation of internal component analyzer 444, dataset functional analyzer 442, metadata analyzer 441, and SQL analyzer 443. However, the order of operation of these analyzers is not limited to the order described above, and other orders of operation of these analyzers are possible.

参照图6，命名为“COBOL2”的过程转换组件554的一个简单的例子(即，对图2的JCL脚本226的步骤5所执行的COBOL程序进行翻译的结果)具有标为“in0”的输入端口556、标为“out0”的输出端口560和标为“lu0”的查找端口562。要注意，查找数据集不一定经由组件上的端口来进行访问，可替代地，而是可以使用查找数据集API来进行访问。然而，为了简化描述，查找数据集描述为经由查找端口来进行访问。6 , a simple example of a process transformation component 554 named "COBOL2" (i.e., the result of translating the COBOL program executed in step 5 of the JCL script 226 of FIG. 2 ) has an input port 556 labeled "in0," an output port 560 labeled "out0," and a lookup port 562 labeled "lu0." Note that lookup data sets are not necessarily accessed via ports on the component; instead, they can be accessed using the lookup data set API. However, to simplify the description, the lookup data sets are described as being accessed via the lookup ports.

这些端口的每一个配置为通过流连接至它们各自的数据集(JCL脚本226识别的)。在一些例子中，开发人员能够例如通过在组件上shift双击或者悬停在组件上直到信息泡出现并且点击信息泡中的‘转换’链接来查看并编辑作为过程转换组件554基础的COBOL代码的DML翻译。Each of these ports is configured to connect to their respective data sets (identified by the JCL script 226) via flows. In some examples, a developer can view and edit the DML translation of the COBOL code underlying the process transformation component 554 by, for example, shift-double-clicking on the component or hovering over the component until an information bubble appears and clicking a 'Transform' link in the information bubble.

参照图7，过程转换组件664的另一个例子示出了命名为“COBOL1”的COBOL程序(即，在图2的JCL脚本226的步骤3所执行的COBOL程序)在其代码中包括排序命令的情况。在这种情况下，内部组件分析器448识别排序命令，并将与排序命令有关的信息传递给过程部翻译器445。过程部翻译器445使用来自内部组件分析器448的信息，以利用接口将与过程转换664相关联的代码中的排序命令替代为专门的内部排序子图。子图合成器446使用448所创建的排序信息，并且创建用于将待排序数据从过程转换664提供到内部排序数据流子图组件666的输出端口out1以及用于从内部排序数据流子图组件666接收排序数据的输入端口in1。7 , another example of a process transformation component 664 illustrates a case where a COBOL program named "COBOL1" (i.e., the COBOL program executed in step 3 of JCL script 226 of FIG. 2 ) includes a sort command in its code. In this case, internal component analyzer 448 identifies the sort command and passes information related to the sort command to procedure translator 445. Procedure translator 445 uses the information from internal component analyzer 448 to replace the sort command in the code associated with process transformation 664 with a specialized internal sort subgraph using an interface. Subgraph synthesizer 446 uses the sort information created by 448 and creates an output port out1 for providing to-be-sorted data from process transformation 664 to internal sort dataflow subgraph component 666, and an input port in1 for receiving sorted data from internal sort dataflow subgraph component 666.

参照图8，示出了包括排序命令的过程转换的另一个类似的例子。在本例子中，创建了两个过程转换，而不是创建具有用于提供待排序数据的输出和用于接收排序数据的输入的单个过程转换。这两个过程转换的第一过程转换768具有用于提供待排序数据的输出，而这两个过程转换的第二过程转换770具有用于接收排序数据的输入。如所示出的，在一些例子中，排序数据流组件766能够通过子图合成器446自动连接在两个过程转换768、770之间。在其它例子中，排序数据流组件766能够手动连接在两个过程转换768、770之间。Referring to Figure 8, another similar example of a process transformation including a sort command is shown. In this example, two process transformations are created rather than a single process transformation having an output for providing data to be sorted and an input for receiving the sorted data. A first process transformation 768 of the two process transformations has an output for providing data to be sorted, while a second process transformation 770 of the two process transformations has an input for receiving the sorted data. As shown, in some examples, a sort data flow component 766 can be automatically connected between the two process transformations 768, 770 by the subgraph synthesizer 446. In other examples, the sort data flow component 766 can be manually connected between the two process transformations 768, 770.

2.2子图合成器2.2 Sub-graph synthesizer

再次参照图4，COBOL程序的过程转换452连同内部组件结果448、数据集功能结果450、数据类型结果447和数据库接口信息结果449一起传递给子图合成器446。子图合成器446使用这些输入为COBOL程序228生成数据流图表示338。普遍来说，子图合成器446为每一个COBOL程序228创建包括COBOL程序228的过程转换、与COBOL程序228相关联的数据集以及内部组件分析器444所识别的任何内部组件的数据流图。然后，子图合成器446使用内部组件结果448和数据集功能结果450适当地连接数据集、内部组件以及过程转换452之间的流。子图合成器446使用数据类型结果447来描述流经组件端口的数据。参照图9，命名为COBOL1的示例性COBOL程序的数据流图表示838的一个例子包括过程转换864，过程转换864具有：标为in0的输入端口，通过流连接至输入文件，文件句柄“A”与数据集DS1.data相关联；标为out0的输出端口，通过流连接至输出文件，文件句柄“B”与数据集DS2.data相关联；以及输出端口out1和输入端口in1，通过流连接至内部排序组件866。4 , the COBOL program's process transformations 452 are passed to a subgraph synthesizer 446 along with internal component results 448, data set function results 450, data type results 447, and database interface information results 449. The subgraph synthesizer 446 uses these inputs to generate a data flow graph representation 338 for the COBOL program 228. Generally speaking, the subgraph synthesizer 446 creates a data flow graph for each COBOL program 228 that includes the process transformations for the COBOL program 228, the data sets associated with the COBOL program 228, and any internal components identified by the internal component analyzer 444. The subgraph synthesizer 446 then uses the internal component results 448 and data set function results 450 to appropriately connect the flows between the data sets, internal components, and process transformations 452. The subgraph synthesizer 446 uses the data type results 447 to describe the data flowing through the component ports. 9 , an example of a data flow graph representation 838 of an exemplary COBOL program named COBOL1 includes a process transformation 864 having: an input port labeled in0, connected to an input file via a flow, file handle “A” associated with a data set DS1.data; an output port labeled out0, connected to an output file via a flow, file handle “B” associated with a data set DS2.data; and output port out1 and input port in1, connected to an internal sorting component 866 via flows.

2.3复合图合成器2.3 Composite Graph Synthesizer

返回参照图3，然后，COBOL程序的数据流图表示338连同JCL脚本226一起传递到复合图合成器336。通过分析COBOL程序在JCL脚本226中的执行顺序连同与COBOL程序相关联的数据集的功能，复合图合成器336将COBOL代码的数据流图表示连接成为单个复合数据流图332。3 , the data flow graph representation 338 of the COBOL program is then passed along with the JCL script 226 to a composite graph synthesizer 336. By analyzing the execution order of the COBOL program in the JCL script 226 along with the functionality of the data sets associated with the COBOL program, the composite graph synthesizer 336 connects the data flow graph representations of the COBOL code into a single composite data flow graph 332.

例如，参照图10，命名为COBOL2的COBOL程序的数据流图表示从与标为in0的输入端口处的数据集DS2.data相关联的输入文件“C”读取，通过访问与查找端口lu0处的DS3.data相关联的查找文件“D”来补充数据，并且写入与标为out0的输出端口处的数据集DS4.data相关联的输出文件“E”。命名为COBOL3的COBOL程序的数据流图表示从两个输入数据集读取：与标为in0的输入端口处的DS4.data相关联的“F”以及与标为in1的输入端口处的DS5.data相关联的“G”，并且写入与标为out0的输出端口处的DS6.data相关联的输出数据集“H”。复合图合成器336将JCL脚本226信息与通过COBOL程序的翻译导出的信息合并以确定COBOL2在COBOL3之前执行，并且COBOL2输出DS4.data且COBOL3输入DS4.data，使得COBOL2的标为out0的输出端口能够通过流连接至COBOL3的标为in0的输入端口，从而消除了COBOL3从磁盘读取数据集DS4.data的需要。图10示出示例性复合数据流图932，流通过复制组件933连接COBOL2的标为out0的输出端口和COBOL3的标为in0的输入端口。复制组件933将数据写入磁盘上的DS4.data，而且还经由流将数据直接传递给COBOL3的标为in0的输入端口。以这种方式，COBOL3能够读取从COBOL2流过来的数据，而无需等待数据集DS4.data写入磁盘，并且JCL脚本226未删除的存储在DS4.data中的数据可用于其它进程。For example, referring to Figure 10, the data flow graph of a COBOL program named COBOL2 shows reading from input file "C" associated with data set DS2.data at an input port labeled in0, supplementing data by accessing lookup file "D" associated with DS3.data at lookup port lu0, and writing to output file "E" associated with data set DS4.data at an output port labeled out0. The data flow graph of a COBOL program named COBOL3 shows reading from two input data sets: "F" associated with DS4.data at an input port labeled in0 and "G" associated with DS5.data at an input port labeled in1, and writing to output data set "H" associated with DS6.data at an output port labeled out0. Composite graph synthesizer 336 merges JCL script 226 information with information derived from the translation of the COBOL program to determine that COBOL2 executes before COBOL3 and that COBOL2 outputs DS4.data and COBOL3 inputs DS4.data. This allows COBOL2's output port, labeled out0, to be connected via a flow to COBOL3's input port, labeled in0, thereby eliminating the need for COBOL3 to read data set DS4.data from disk. FIG10 illustrates an exemplary composite data flow graph 932, which connects COBOL2's output port, labeled out0, to COBOL3's input port, labeled in0, via a copy component 933. Copy component 933 writes data to DS4.data on disk and also passes the data directly to COBOL3's input port, labeled in0, via a flow. In this way, COBOL3 can read data streamed from COBOL2 without having to wait for data set DS4.data to be written to disk, and the data stored in DS4.data, which was not deleted by JCL script 226, is available to other processes.

在一些例子中，如果JCL过程在中间数据集(例如，文件)创建之后不将其删除，该数据集很可能被执行环境中运行的一些其它进程所使用。在遇到这种情况的例子中，中间数据集保存在JCL过程的数据流图表示中(例如，通过使用如上所述的复制组件)。在一些例子中，如果JCL过程在中间数据集创建之后将其删除，则中间数据集在JCL过程的数据流图表示中完全消除，并且其不再需要复制组件。In some examples, if a JCL procedure does not delete an intermediate dataset (e.g., a file) after it is created, the dataset is likely to be used by some other process running in the execution environment. In examples where this is the case, the intermediate dataset is preserved in the data flow graph representation of the JCL procedure (e.g., by using a replication component as described above). In some examples, if the JCL procedure deletes the intermediate dataset after it is created, the intermediate dataset is completely eliminated from the data flow graph representation of the JCL procedure, and the replication component is no longer required.

在一些例子中，如上所述的COBOL2和COBOL3数据流图的通过流连接的端口的元数据可能不相同，因为对于相同的数据集，第一软件规范使用替代的定义。然后，复合图合成器336能够在连接的流上插入重新定义格式组件。这种重新定义格式组件的存在能够后续用于整合数据集元数据。由元数据分析器441为每一个数据流图338导出元数据信息。In some examples, the metadata for ports connected by flows in the COBOL 2 and COBOL 3 dataflow graphs described above may differ because the first software specification uses alternative definitions for the same data set. Composite graph synthesizer 336 can then insert reformatted components on the connected flows. The presence of such reformatted components can be used later to integrate the data set metadata. Metadata analyzer 441 derives metadata information for each dataflow graph 338.

3示例性操作3. Example Operations

参照图11，翻译模块120的简单操作例子接收图2的JCL脚本226和四个COBOL程序228作为输入，并且处理这些输入来生成复合数据流图332。11 , a simple example of the operation of the translation module 120 receives the JCL script 226 and the four COBOL programs 228 of FIG. 2 as inputs and processes these inputs to generate a composite data flow graph 332 .

在翻译进程的第一阶段，COBOL程序228提供给COBOL到数据流图翻译器334，其处理COBOL程序的每一个以生成COBOL程序的数据流图表示338a-d。在第二阶段，JCL脚本226和COBOL程序的数据流图表示338a-d提供给复合图合成器336，复合图合成器336处理JCL脚本226和COBOL程序的数据流图表示338a-d以生成复合数据流图332。In a first phase of the translation process, the COBOL programs 228 are provided to a COBOL to data flow graph translator 334, which processes each of the COBOL programs to generate data flow graph representations 338a-d of the COBOL programs. In a second phase, the JCL scripts 226 and the data flow graph representations 338a-d of the COBOL programs are provided to a composite graph synthesizer 336, which processes the JCL scripts 226 and the data flow graph representations 338a-d of the COBOL programs to generate a composite data flow graph 332.

参照图12，COBOL到数据流图翻译器334使用COBOL解析器440、内部组件分析器444、数据集功能分析器442、元数据分析器441和SQL分析器443处理COBOL程序228的每一个。COBOL解析器440、内部组件分析器444、数据集功能分析器442、元数据分析器441和SQL分析器443生成的输出提供给过程部翻译器445，并且与过程部翻译器445的输出一起提供给为COBOL程序的每一个生成数据流图表示338a-d的子图合成器446。12 , the COBOL to data flow graph translator 334 processes each of the COBOL programs 228 using a COBOL parser 440, an internal component analyzer 444, a data set functional analyzer 442, a metadata analyzer 441, and an SQL analyzer 443. Output generated by the COBOL parser 440, the internal component analyzer 444, the data set functional analyzer 442, the metadata analyzer 441, and the SQL analyzer 443 is provided to a procedure translator 445 and, along with the output of the procedure translator 445, to a subgraph synthesizer 446 that generates a data flow graph representation 338 a-d for each of the COBOL programs.

对于在JCL脚本226的步骤3执行的COBOL1程序，内部组件分析器444确认该程序包括内部排序组件。数据集功能分析器442确认COBOL1程序访问一个输入数据集“A”和一个输出数据集“B”。在COBOL1程序的数据流图表示338a中反映出所识别的内部排序组件、数据集以及它们与COBOL1程序的过程转换的关系。For the COBOL 1 program executed in step 3 of JCL script 226, internal component analyzer 444 confirms that the program includes an internal sequencing component. Data set functionality analyzer 442 confirms that the COBOL 1 program accesses an input data set "A" and an output data set "B." The identified internal sequencing component, data sets, and their relationship to the process transformations of the COBOL 1 program are reflected in the data flow graph representation 338a of the COBOL 1 program.

对于在JCL脚本226的步骤5执行的COBOL2程序，内部组件分析器444未识别到任何内部组件，并且SQL分析器443未识别到任何嵌入式SQL代码。数据集功能分析器442确定COBOL2程序访问一个数据集“C”作为输入数据集，访问另一个数据集“E”作为输出数据集，并且访问另一个数据集“D”作为查找数据集。在COBOL2程序的数据流图表示338b中反映出所识别的数据集以及它们与COBOL2程序的过程转换的关系。For the COBOL 2 program executed in step 5 of JCL script 226, internal component analyzer 444 did not identify any internal components, and SQL analyzer 443 did not identify any embedded SQL code. Dataset function analyzer 442 determined that the COBOL 2 program accessed one data set "C" as an input data set, another data set "E" as an output data set, and another data set "D" as a lookup data set. The identified data sets and their relationship to the process transformations of the COBOL 2 program are reflected in the data flow graph representation 338b of the COBOL 2 program.

对于在JCL脚本226的步骤6执行的COBOL3程序，内部组件分析器444未识别到任何内部组件，并且SQL分析器443未识别到任何嵌入式SQL代码。数据集功能分析器442确定COBOL3程序访问两个数据集“F”和“G”作为输入数据集并且访问一个数据集“H”作为输出数据集。在COBOL3程序的数据流图表示338c中反映出所识别的数据集以及它们与COBOL3程序的过程转换的关系。For the COBOL 3 program executed in step 6 of JCL script 226, internal component analyzer 444 did not identify any internal components, and SQL analyzer 443 did not identify any embedded SQL code. Dataset functionality analyzer 442 determined that the COBOL 3 program accessed two data sets "F" and "G" as input data sets and one data set "H" as output data set. The identified data sets and their relationship to the process transformations of the COBOL 3 program are reflected in the data flow graph representation 338c of the COBOL 3 program.

对于在JCL脚本226的步骤10执行的COBOL4程序，内部组件分析器444未识别到任何内部组件，并且SQL分析器443未识别到任何嵌入式SQL代码。数据集功能分析器442确定COBOL4程序访问一个数据集“I”作为输入数据集并且访问另一个数据集“J”作为输出数据集。在COBOL4程序的数据流图表示338d中反映出所识别的数据集以及它们与COBOL4程序的过程转换的关系。For the COBOL 4 program executed in step 10 of JCL script 226, internal component analyzer 444 did not identify any internal components, and SQL analyzer 443 did not identify any embedded SQL code. Dataset functionality analyzer 442 determined that the COBOL 4 program accessed one data set "I" as an input data set and another data set "J" as an output data set. The identified data sets and their relationship to the COBOL 4 program's process transformations are reflected in the data flow graph representation 338d of the COBOL 4 program.

再次参照图11，JCL脚本226以及四个COBOL程序的数据流图表示338a-d提供给分析JCL脚本226和数据流图表示338a-d的复合图合成器336以将数据流图表示338a-d连接成为单个复合图332。参照图13，图2的JCL脚本226和四个COBOL程序228的复合图包括通过流互连的四个过程转换COBOL1 452a、COBOL2 452b、COBOL3 452c和COBOL4 452d。复制组件933用于在复合数据流图332中留出(即，作为输出数据集写入)多个中间数据集(即，DS2.data、DS4.data和DS5.data)，直接使用流连接组件。11, the JCL script 226 and the data flow graph representations 338a-d of the four COBOL programs are provided to a composite graph synthesizer 336 that analyzes the JCL script 226 and the data flow graph representations 338a-d to connect the data flow graph representations 338a-d into a single composite graph 332. Referring to FIG13, the composite graph of the JCL script 226 and the four COBOL programs 228 of FIG2 includes four process transformations COBOL1 452a, COBOL2 452b, COBOL3 452c, and COBOL4 452d interconnected by flows. The copy component 933 is used to leave (i.e., write as output data sets) multiple intermediate data sets (i.e., DS2.data, DS4.data, and DS5.data) in the composite data flow graph 332 directly using the flow connection component.

4可供选择的方案4 options available

虽然上述说明仅描述了用过程式编程语言编写的程序的翻译成数据流图组件的数量有限的操作和元素，然而在一些例子中，将原始程序(例如，COBOL程序)的所有源代码翻译成数据流图表示。Although the above description describes only a limited number of operations and elements of a program written in a procedural programming language translated into dataflow graph components, in some examples, all source code of the original program (e.g., a COBOL program) is translated into a dataflow graph representation.

上述系统能够用于将包括一种或多种过程式编程语言的任何组合的软件规范翻译成软件规范的数据流图表示。The above-described system can be used to translate a software specification comprising any combination of one or more procedural programming languages into a data flow graph representation of the software specification.

在一些例子中，上述翻译模块可能遇到未准备进行处理的翻译任务。在这些例子中，翻译模块输出开发人员能够读取并用来手动纠正翻译的未完成翻译任务的列表。In some examples, the translation module may encounter translation tasks that it is not ready to process. In these examples, the translation module outputs a list of unfinished translation tasks that a developer can read and use to manually correct the translation.

虽然上述说明描述了COBOL到数据流图翻译器334的并行运行的某些模块，然而不一定是这种情况。在一些例子中，元数据分析器441首先从COBOL解析器440接收解析树。元数据分析器441补充和/或简化解析树，并将其提供给数据集功能分析器442。数据集功能分析器442补充和/或简化解析树，并将其提供给SQL分析器443。SQL分析器443补充和/或简化解析树，并将其提供给内部组件分析器444。内部组件分析器444补充和/或简化解析树，并将其提供给过程部翻译器445。这是组件连续地对解析树进行操作。While the above description describes certain modules of the COBOL to data flow graph translator 334 operating in parallel, this is not necessarily the case. In some examples, metadata analyzer 441 first receives the parse tree from COBOL parser 440. Metadata analyzer 441 supplements and/or simplifies the parse tree and provides it to dataset function analyzer 442. Dataset function analyzer 442 supplements and/or simplifies the parse tree and provides it to SQL analyzer 443. SQL analyzer 443 supplements and/or simplifies the parse tree and provides it to internal component analyzer 444. Internal component analyzer 444 supplements and/or simplifies the parse tree and provides it to procedure translator 445. This is a component that operates on the parse tree sequentially.

5实现5 Implementation

上述源代码翻译方法能够例如使用执行适当软件指令的可编程计算系统来实现，或者其能够以适当硬件(例如，现场可编程门阵列(FPGA))或以一些混合形式来实现。例如，在编程方法中，该软件可以包括在一个或多个编程或可编程的计算系统(其可以是各种架构，诸如，分布式、客户端/服务器、或网格)上执行的一个或多个计算机程序中的多个过程，每个计算系统包括至少一个处理器、至少一个数据存储系统(包括易失性和/或非易失性存储器和/或存储元件)、至少一个用户界面(用于使用至少一个输入设备或端口接收输入并且用于使用至少一个输出设备或端口提供输出)。该软件可以包括例如更大程序的一个或多个模块，该更大程序提供与数据流图的设计、配置和执行有关的服务。该程序的模块(例如，数据流图的元素)能够实现为数据结构或符合存储在数据储存库中的数据模型的其它组织的数据。The source code translation method described above can be implemented, for example, using a programmable computing system that executes appropriate software instructions, or it can be implemented in appropriate hardware (e.g., a field programmable gate array (FPGA)) or in some hybrid form. For example, in a programming method, the software can include multiple processes in one or more computer programs executed on one or more programmed or programmable computing systems (which can be of various architectures, such as distributed, client/server, or grid), each computing system including at least one processor, at least one data storage system (including volatile and/or non-volatile memory and/or storage elements), at least one user interface (for receiving input using at least one input device or port and for providing output using at least one output device or port). The software can include, for example, one or more modules of a larger program that provides services related to the design, configuration, and execution of data flow graphs. The modules of the program (e.g., elements of a data flow graph) can be implemented as data structures or other organized data that conform to a data model stored in a data repository.

该软件可以设置在有形的非临时性介质(诸如CD-ROM)或其它计算机可读介质(例如，通用或专用计算系统或设备可读的)、或者该软件可以在其被执行时通过网络的通信介质递送(例如，以传播信号形式)到计算系统的有形的非临时性介质。处理的一些或全部可以在专用计算机上执行，或者使用专用硬件，诸如协处理器或现场可编程门阵列(FPGA)或专用集成电路(ASIC)。处理可以以分布式方式来实现，该方式使软件指定的计算的不同部分由不同的计算元件来进行。每一个这种计算机程序优选地存储在或下载到可由通用或专用可编程计算机可访问的存储设备的计算机可读存储介质(例如，固态存储器或介质、或者磁或光介质)，用于在计算机读取该存储介质或设备时配置和操作该计算机，以执行本文描述的处理。也可以考虑将本发明的系统实现为有形的非临时性介质，其配置有计算机程序，其中，如此配置的介质促使计算机以特定和预定义的方式操作以执行本文描述的一个或多个处理步骤。The software may be provided on a tangible, non-transitory medium (such as a CD-ROM) or other computer-readable medium (e.g., readable by a general-purpose or special-purpose computing system or device), or the software may be delivered (e.g., in the form of a propagated signal) to a tangible, non-transitory medium of a computing system via a communication medium of a network when it is executed. Some or all of the processing may be performed on a dedicated computer, or using dedicated hardware, such as a coprocessor or field programmable gate array (FPGA) or application-specific integrated circuit (ASIC). The processing may be implemented in a distributed manner, where different parts of the calculations specified by the software are performed by different computing elements. Each such computer program is preferably stored or downloaded to a computer-readable storage medium (e.g., solid-state memory or medium, or magnetic or optical medium) of a storage device accessible to a general-purpose or special-purpose programmable computer, for configuring and operating the computer when the storage medium or device is read by the computer to perform the processing described herein. It is also contemplated that the system of the present invention may be implemented as a tangible, non-transitory medium configured with a computer program, wherein the medium so configured causes the computer to operate in a specific and predefined manner to perform one or more processing steps described herein.

已经对本发明的多个实施例进行了描述。然而，应当理解，前面的描述旨在说明而并非限制本发明的范围，本发明的范围由以下权利要求书的范围来限定。因此，其它实施例也落在以下权利要求书的范围内。例如，在不脱离本发明的范围的情况下可进行各种修改。此外，上述的一些步骤可以是顺序独立的，因此可以以不同于所述的顺序来执行。Several embodiments of the present invention have been described. However, it should be understood that the foregoing description is intended to illustrate and not to limit the scope of the present invention, which is defined by the scope of the following claims. Therefore, other embodiments are also within the scope of the following claims. For example, various modifications may be made without departing from the scope of the present invention. Furthermore, some of the steps described above may be sequence-independent and, therefore, may be performed in an order different from that described.

Claims

1. A method for translating software specifications, comprising:

Receive the first software specification specified in the first programming language;

Receive the second software specification specified in the second programming language;

Receive a third software specification specified in a third programming language different from the first and second programming languages, the third software specification defining one or more data relationships between the first and second software specifications;

The first software specification is represented using a fourth programming language that is different from the first, second, and third programming languages;

The representation of the second software specification is formed using the fourth programming language;

Analyze the third software specification to identify the one or more data relationships; and

Forming a combined representation of the first software specification and the second software specification using the fourth programming language includes, based on the identified one or more data relationships, forming a connection between the representation of the first software specification in the fourth programming language and the representation of the second software specification in the fourth programming language.

2. The method according to claim 1, wherein the first programming language is a procedural programming language.

3. The method of claim 2, wherein the fourth programming language enables parallelism between different parts of the software specification.

4. The method of claim 3, wherein the fourth programming language enables multiple types of parallelism, including:

The first type of parallelism enables multiple instances of a part of the software specification to operate on different portions of the input data stream; and

The second type of parallelism enables different parts of the software specification to be executed simultaneously for different parts of the input data stream.

5. The method according to claim 2, wherein the second programming language is a procedural programming language.

6. The method according to claim 2, wherein the second programming language is the same as the first programming language.

7. The method according to claim 1, wherein the one or more data relationships between the first software specification and the second software specification include at least one data relationship, the at least one data relationship corresponding to the first software specification receiving data from the first dataset and the second software specification providing data to the first dataset.

8. The method according to claim 1, wherein the fourth programming language is a data flow graph-based programming language.

9. The method of claim 8, wherein the connection of the fourth programming language corresponds to a directed link representing a data stream.

10. The method of claim 1, wherein the first software specification is configured to interact with one or more datasets, each dataset having an associated dataset type from a plurality of dataset types of the first software specification, and the second software specification is configured to interact with one or more datasets, each dataset having an associated type from a plurality of dataset types of the second software specification, the method further comprising:

Processing the first software specification, the processing includes:

Identify one or more datasets of the first software specification, and determine the associated type of each of the identified one or more datasets in the first software specification; and

Forming a representation of the first software specification using the fourth programming language includes: forming a specification for each of the one or more identified datasets using the fourth programming language, wherein the specification of the dataset in the fourth programming language has a type corresponding to the associated type of the dataset in the first programming language;

The specification of the one or more datasets of the fourth programming language has at least one of the following: an input dataset type or an output dataset type;

Processing the second software specification, the processing includes:

Identify the one or more datasets of the second software specification, and determine the associated type of the dataset in the second software specification for each of the identified one or more datasets; and

Forming a representation of the second software specification using the fourth programming language includes: forming a specification for each of the one or more identified datasets using the fourth programming language, wherein the specification of the dataset in the fourth programming language has a type corresponding to the associated type of the dataset in the first programming language;

The specification of the one or more datasets of the fourth programming language enables input or output functionality.

11. The method of claim 10, wherein forming the combined representation comprises at least one of the following:

One or more connections are formed to replace the connection between the specification of the input function enabled in one or more datasets of the second software specification of the fourth programming language and the representation of the second software specification of the fourth programming language with the connection between the first software specification of the fourth programming language and the representation of the second software specification of the fourth programming language; or

One or more connections are formed to replace the connection between the specification of the one or more datasets that enable input functionality in the first software specification of the fourth programming language and the representation of the second software specification of the fourth programming language with the connection between the second software specification of the fourth programming language and the representation of the first software specification of the fourth programming language.

12. The method of claim 11, further comprising:

The first software specification of the fourth programming language stores one or more datasets of the first software specification of the fourth programming language that enable output functionality, or

The representation of the second software specification of the fourth programming language that enables output functionality stores one or more datasets of the second software specification of the fourth programming language.

13. The method of claim 10, wherein:

The first software specification includes one or more data transformation operations, and analyzing the first software specification includes identifying at least some of the one or more data transformation operations and classifying the identified data transformation operations into corresponding data transformation types of the fourth programming language, and

The representation of the first software specification forming the fourth programming language includes: forming a specification for each of the identified data transformation operations of the fourth programming language, wherein the specification of the data transformation operation of the fourth programming language enables a data transformation operation corresponding to the data transformation type of the identified data transformation operation of the first programming language.

14. The method of claim 10, wherein at least one of the specifications of the one or more datasets of the fourth programming language has a read-only random access dataset type.

15. The method of claim 10, wherein determining the associated type of the dataset in the first software specification includes analyzing parameters of the dataset definition and commands that access the dataset.

16. The method of claim 15, wherein the parameters include one or more of the following: file organization associated with the dataset, access mode associated with the dataset, mode for opening the dataset, and input/output operations.

17. The method according to claim 1, further comprising:

The combined representation of the first software specification and the second software specification is stored in the storage medium.

18. The method of claim 1, wherein the first software specification defines one or more data processing operations that interact with one or more datasets, and the second software specification defines one or more data processing operations that interact with one or more datasets.

19. The method of claim 18, wherein the third software specification defines one or more data relationships between the one or more datasets of the first software specification and the one or more datasets of the second software specification.

20. Software stored in a non-transitory form on a computer-readable medium for translating software specifications, said software comprising instructions that cause a computing system to perform the following operations:

21. A computational system for translating software specifications, the computational system comprising:

An input device or port is configured to receive software specifications, which include:

The first software specification is defined using the first programming language;

A second software specification defined using a second programming language;

A third software specification defined in a third programming language different from the first and second programming languages, the third software specification defining one or more data relationships between the first and second software specifications; and

At least one processor configured to process the received software specification, the processing including: