The CAPIO middleware
The CAPIO middleware (Cross-Application Programmable I/O), is the reference implementation of the CAPIO-CL language, aimed at injecting streaming capabilities to workflow steps without changing the application codebase. It has been proven to work with C/C++ binaries, Fortran Binaries, JAVA, python and bash.
Compile CAPIO
CAPIO only works on LINUX kernel using the GLIBC library. At the moment, only x86_64
systems are supported (i386
archs might be supported but it has not been tested, nor it is in the future goals to test it). This limitation is due to the compatibility of the intercepting library that CAPIO uses. We are actively working on porting the intercepting library to both RISC-V
and ARM64
architecutres, but there is no eta on that.
Dependencies
CAPIO depends on the following software that needs to be manually installed:
cmake >=3.15
c++17
or newer (the CAPIO middleware uses the C++17 standard)openmpi
pthreads
The following dependencies are automatically fetched during cmake configuration phase, and compiled when required.
- syscall_intercept to intercept syscalls
- Taywee/args to parse server command line inputs
- simdjson/simdjson to parse json configuration files
Compile
git clone https://github.com/High-Performance-IO/capio.git capio && cd capio
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . -j$(nproc)
sudo cmake --install .
It is also possible to enable log in CAPIO, by defining -DCAPIO_LOG=TRUE
(be aware that the logger heavily slows workflow execution time, and will generate huge log files) in the cmake phase, as long as the target is Debug. In this case then, the cmake command will be:
cmake -DCMAKE_BUILD_TYPE=Debug -DCAPIO_LOG=ON ..
Docker images
Docker images with CAPIO pre compiled and installed are vailable at Dockerhub
Use CAPIO in your code
Good news! You don't need to modify your code to benefit from the features of CAPIO. You have only to do three steps ( the first is optional).
1) Write a configuration file for injecting streaming capabilities to your workflow (see CAPIO_CL documentation for this step)
2) Launch the CAPIO daemons with MPI passing the (eventual) configuration file as argument on the machines in which you
want to execute your program (one daemon for each node). If you desire to specify a custom folder
for capio, set CAPIO_DIR
as a environment variable.
[CAPIO_DIR=your_capiodir] [mpiexec -N 1 --hostfile your_hostfile] capio_server -c conf.json
3) Launch your programs preloading the CAPIO shared library like this:
CAPIO_DIR=your_capiodir \
CAPIO_WORKFLOW_NAME=wfname \
CAPIO_APP_NAME=appname \
LD_PRELOAD=libcapio_posix.so \
./your_app <args>
Theese variables can also be exported, but it is not a suggested practice.
CAPIO_DIR
must be specified when launching a program with the CAPIO library. ifCAPIO_DIR
is not specified, CAPIO will not intercept syscalls.
A note on enabling the CAPIO logger
CAPIO has built inside itself an extremey powerful, custom built logger that gets compile only when in Debug
target. Cmake ensures that it is not built in release mode, and you should not change this, as undefined behaviour occurs when compiling the logger in Release
target.
If used incorrectly, the logger component will create massive log files, which can and will fill up a lot of space inside the file system. lease, if you enable the logger, ensure that only short workflows are executed.
Be also aware that the logger will also leak the content of the handled files. If you need support and need to provide logfiles, please ensure to anonymize first the content of the logs, of any possible sensitive inforations.