CAPIO-CL: CAPIO Coordination Language
CAPIO-CL is a new I/O coordination language that allows users to annotate file-based workflow data dependencies with synchronization semantics information related to files and directories. CAPIO-CL, aims to enable the transparent overlap of computation and I/O operations among distinct producer-consumer application modules.
The language used for expressing the I/O coordination language syntax and semantics is JSON (JavaScript Object Notation). JSON is not tied to any particular programming language or platform and is widely supported across various programming languages (Java, C++, Python, etc.).
Although it is not the most commonly adopted language in the context of high-level coordination languages for expressing parallel computations, it provides automatic syntax validation features through the JSON schema. Using JSON can make the learning curve less steep for the users.
Streaming injection
To better explain what we intend with injcting streaming capabilities, let's have a look at the following example. Let's say that we have a file based workflow comprised of two steps, the first one is a producer
step and the scond one is a consumer
step. The consumer
steps consumes the data produced by the consumer
step.
The following picture describes a gant diagram of the sequence of the operations that the workflow executes.
We can clearly see that we are required to wait for the producer
step to terminate before the consumer
step can begin to consume the data.
Trough the CAPIO-CL (and a implementation of the language), we are able to improve performance. For example, the next picture shows how the same workflow can benefit from streaming injection:
The CAPIO-CL coordination language aims to describe file dependencies between steps, to ensure that streaming injection ca ben coordinated correctly.