CAPIO-CL: CAPIO Coordination Language

CAPIO-CL is a new I/O coordination language that allows users to annotate file-based workflow data dependencies with synchronization semantics information related to files and directories. CAPIO-CL, aims to enable the transparent overlap of computation and I/O operations among distinct producer-consumer application modules.

The language used for expressing the I/O coordination language syntax and semantics is JSON (JavaScript Object Notation). JSON is not tied to any particular programming language or platform and is widely supported across various programming languages (Java, C++, Python, etc.).

Although it is not the most commonly adopted language in the context of high-level coordination languages for expressing parallel computations, it provides automatic syntax validation features through the JSON schema. Using JSON can make the learning curve less steep for the users.

Streaming injection

To better explain what we intend with injcting streaming capabilities, let's have a look at the following example. Let's say that we have a file based workflow comprised of two steps, the first one is a producer step and the scond one is a consumer step. The consumer steps consumes the data produced by the consumer step. The following picture describes a gant diagram of the sequence of the operations that the workflow executes. Workflow_example

We can clearly see that we are required to wait for the producer step to terminate before the consumer step can begin to consume the data.

Trough the CAPIO-CL (and a implementation of the language), we are able to improve performance. For example, the next picture shows how the same workflow can benefit from streaming injection: Workflow_example

The CAPIO-CL coordination language aims to describe file dependencies between steps, to ensure that streaming injection ca ben coordinated correctly.