AiiDA: Full support of
plumpy as its workflow backend and uses
circus to daemonize its workflow manager process. However, the
plumpy workflow library, and the circus process & socket manager have not kept up with recent developments, forcing libraries of AiiDA ecosystem to run with outdated versions of
tornado, and making it incompatible with the latest python web technology. In this project, I will replace
tornado dependencies of
asyncio to enable full support of
aiida-core. If the goals and deliverables are reached before the end of the project, I will also migrate
circus, which is also used by many other open-source projects besides AiiDA, from
Coroutines, and asynchronous programming in general are used in many python web technologies, such as jupyter notebooks, volia, bokeh etc. As web technologies evolve, so do libraries for asynchronous programming, such as tornado or the asyncio module of the python standard library(available since python 3.4).
plumpy1 is a python workflow library that supports writing Processes with a well defined set of inputs and outputs that can be strung together.
circus2 is a Mozilla Foundation python library that runs and watches processes and sockets. It can be used as a library or through the command line.
circus to daemonize its workflow manager process and uses
plumpy as its workflow backend. However,
circus (both dependencies of
aiida-core) have not kept up with recent developments, forcing AiiDA to run with outdated versions of
tornado, and making it incompatible with the latest python web technology. The growing use of AiiDA in jupyter notebooks and web applications on platforms like the AiiDA lab and the Materials Cloud make it increasingly important to resolve this issue.
circus is used by many other open-source projects besides AiiDA, which are all to benefit from this development.
Code bases involved
tests/test_processes.py. Re-designing the
Future class used of
plumpy library. Unlike
Future object in
Future instance in
asyncio (or maybe will using
concurrent.futures.Future) does not have attribute
_done. Therefore the
SavableFuture class should auto persist with
_state rather than
ContextVar to store and access context-local context variable
_process_stack, instead of using
_thread_local. The file
plumpy/test_utils which stores the process demos used in unittest would be better removed under the test module from plumpy module.
Two components of
aiida-core are involved with asynchronous programming. First one is
aiida/manager. It creates a runner when daemon is running and sends a task which is the instance of class ProcessLauncher. The aiida engine is the other component that needs asynchronous programming. In fact the runner used in manager mentioned above is actually created in
aiida/engine/runner.py as the instance of class Runner. An event loop is launched here and processes are subscribed to the runner and processed asynchronously. There is also a
TransportQueue class that yields a future result. This class is used in
aiida/engine/processes/calcjobs and it allows clients to register their interest in a transport object which will be provided at some point in the future.
aiida-core calling functions in circus should not be affected, therefore necessary to keep the
circus API unchanged. Codes in
aiida-core that call circus libraries serve as a proper start point to get a deep insight of the circus library. In file
aiida/engine/daemon/client.py, circus client instance is created (in function
DaemonClient.client) to control the behaviour of daemonized processes. And in file
aiida/cmdline/commands/cmd_daemon.py users start daemons by calling
start_circus function which sets up the arbiter config and actually launches the circus daemon.
Circus client is tested in file
circus/tests/test_client.py. Circus arbiter is tested in file
circus/tests/test_arbiter.py. The main TestCase classes are inherited from class
TestCircus in file
circus/tests/support.py. I will regard these three test files as entry points to start refactoring.
I will replace
tornado with asyncio progressively until all
tornado dependencies are totally removed.
- Depend on kiwipy develop branch at the beginning but change to stable release branch to guarantee the asyncio version plumpy also work with tornado version kiwipy.
- Pin on python 3.5 features rather than py3.7 which include lots of new asyncio features to make sure the refactoring works for all py3.
- Test with
asyncio by doing:
- Unittests involved with asynchronous code is constructed with
Process.step()call and similar to use
contextvarto record the stack of the current process.
After the code being modified, run integration test in
aiida-core to make sure all code changes are working flawlessly with
aiida-core>=1.1. Meanwhile the test coverage in
aiida-core should not be reduced.
tornado==5.0 supports native coroutines and wrapper
asyncio event loop in python34. And the latest
tornado has adopted the coroutine implementation and has been compatible with
asyncio. There is not much difference in writing
asyncio code or the
tornado asynchronous code, it is easy to change to the support to
asynio when updating
tornado to high version refactoring is done. The follow-up is up to the decision of the circus community.
According to issue5 discussed,
@k4nar suggested making a PR to remove the support for Python versions before 3.5. This is actually equivalent to updating the tornado version to >6.0 which no longer supports Python 2.7 and 3.4 but to the minimum supported Python version is 3.5.2. Then continue to work on moving to asyncio, so we can always have the solution to fallback to tornado, if completely asyncio replacement proved to be too hard.
My plan is pinning to tornado version >5.0 and <6.0 both support python 2.7.9+ and 3.5+ which can still be used for most of users, even when the upgrading refactoring attempt fails.
The fundamental documentation for developers are the throughout docstring for every class and important functions.
In code phase,
missing-docstring check should be turn off in
.pylintrc. Then turn it on in documentation stage and add the necessary docstring.
Schedule of Deliverables
This project will:
asyncio(or at least by
- Write further developer documentation (mostly the API documentation provided in docstring of useful class and methods) for plumpy and kiwipy.
- Thoroughly test to make sure asyncio version of
circusis working flawlessly with
- Deliverables include one PR for plumpy and aiida-core each, and two PRs for circus one for removing py2 compatibles and the other (stretch goal) for replacing tornado with asyncio.
Mentors will help manage interactions with the
Community Bonding period
Early May-1 June (+12d):
- Familiarize myself with plumpy’s functionality.
- Familiarize myself with the circus’s functionality.
- Knowing which modules depend on the asynchronous programming.
1-12 June (12d): plumpy migration
asynciofor all unittest.
12-16 June (4d): plumpy tests & documentation
- Integration test to make sure the new version of plumpy works well with
- Fix bugs appear in this test stage.
- Document existing code of
plumpynot only for users but for developers.
- Turn on the missing-docstring option in pylint configuration file, and add the comprehensive docstring for useful class and methods.
- Summarize the outcomes of this phase and prepare for the next coding phase.
(Remember to submit Phase 1 evaluations, DDL June 29)
16 June - 28 June (12d): aiida-core migration & tests
- Replace tornado dependencies in aiida-core with asyncio
- Start from
aiida/enginereplaceing coroutines with
asyncioand registing task into
- Make sure the change not break other part of
- Using the event loop provided by asyncio, and registering the process in this event loop.
- Steps in a process are synchronous and different processes are running asynchronously. Therefore, make sure not to break this behaviour.
- Making a comprehensive test.
28 June - 22 July (24d): migrate circus arbiter & client (stretch goal)
- Remove all py2 compatibility supported by six library.
- Remove backwards-compatible to Python version before 3.5.2. Setting the Tornado version to 6.0.4.
- Fixing and coding to pass all failed unittest after upgrading the
- Start from
tornadoevent loop with
- Replace original test backend (nosetest) with pytest-asyncio for asynchronous unittest.
22 July - 1 August (10d): circus migration continued (stretch goal)
If the arbiter and client are successfully refactoring, the main body this phase is almost finished. Then refactor the remaining parts of the circus to working with
1-14 August (14d):
Further refine tests for
aiida-core. Make sure
aiida-core can work flawlessly with the new
plumpy and new
circus and test coverage does not go down.
14 August - 21 August (7d):
- Document existing code of
kiwipynot only for users but for developers.
- Summary the outcomes of the phase.
(Remember to submit Phase 2 evaluations, DDL July 31)
21-31 August (10d):
A buffer of 10 days has been kept for any unpredictable delay.
I am already familiar with the code of
kiwipy, and have already contributed pull requests that have been accepted678.
I have developed AiiDA plugins
aiida-deepmd10 for training the potential functions and I am currently leading the effort to translate the AiiDA documentation to Chinese11.
I have experience in programming in Golang, and end with homework like project
stateflow12 which requires the use of concurrent programming techniques. I am the main developer of
sagar13 which is a python library with some utilities to generate and inspect grid site structures in material science, and as the administrator of the compute cluster in our laboratory, I have ripe experience in building and maintaining the HPC.
Why this project?
I use AiiDA to manage my high-throughput calculations for a long time, and find it is really accelerating my academic research a lot (I now have a paper under review that uses AiiDA for its high-throughput calculations 14). AiiDA is well maintained, actively developed, and open to the contributions. This makes it stand out among similar tools in related fields.
I notice from the community that AiiDA suffers a bit from the outdated circus which depends on
tornado<5 at the moment, since much of the python ecosystem used by AiiDA starts to require
asyncio. Thus, this project will benefit the AiiDA to get rid of dependency problems which occur frequently when installing other python tools with
As an active user I want to contribute my effort to the project so I can not only learn advanced and standard technologies in python programming but know how to collaborate with others on open source projects.
According to the suggestion of GSoC and NumFOCUS I will post recaps and working logs on my personal blog 15 once a week.
I will frequently commit code changes and rebase commits clearly. In order to make an easy evaluation result for GSoC, I will make one PR for plumpy and aiida-core each. Separate PR to circus into two stages. For circus, I will make two PRs, one for removing compatibility
py2 and the other for replacing
Coding style follows: AiiDA Coding-style16 Turn on the pre-commit, checking and fixing the style of code.
plumpy, https://github.com/aiidateam/plumpy ↩︎
cicus, https://github.com/circus-tent/circus ↩︎
aiida-core, https://github.com/aiidateam/aiida-core ↩︎
free wish: tornado upgrade or switch to asyncio?, https://github.com/circus-tent/circus/issues/1124#issuecomment-600057407 ↩︎
aiida-ce, https://github.com/unkcpz/aiida-ce ↩︎
aiida-deepmd, https://github.com/unkcpz/aiida-deepmd ↩︎
stateflow, https://github.com/unkcpz/stateflow ↩︎
sagar, https://github.com/scut-ccmp/sagar/graphs/contributors ↩︎
Academic paper, https://arxiv.org/abs/2003.01481 ↩︎
Personal blog, http://morty.tech ↩︎
AiiDA coding-style, https://github.com/aiidateam/aiida-core/wiki/Coding-style ↩︎