Big data is changing the face of research. Fields like astronomy and bioinformatics are leading the way in the management of enormous data sets containing some of the secrets of the universe and our DNA. To ensure that South African researchers can be global pioneers in the fields of astronomy and bioinformatics, a consortium of universities and research organisations have established a data-intensive research cloud, Ilifu – which means cloud in isiXhosa – and are inviting researchers in these two strategic science domains to start using the infrastructure.
Ilifu is a regional node, known as a Tier II node, in a national infrastructure, and partly funded by the Department of Science and Technology (DST) through their Data-Intensive Research Initiative of South Africa (DIRISA). It brings together the existing infrastructure and expertise of the six partner institutions (see text box) and builds on that to create a regional hub for data-intensive research.
“One of the principles of Ilifu is that it is a research facility owned and managed by researchers,” says Professor Russ Taylor, director of the Inter-University Institute for Data-Intensive Astronomy (IDIA), and joint SKA Chair at UCT and the University of Western Cape (UWC). “This means that the facility will engage with researchers to make sure it works for them.”
The building blocks of Ilifu were created in 2015 when UCT and the North West University worked together to build a research cloud. Since then IDIA added additional nodes, and the result became known as the IDIA research cloud. The IDIA cloud, which has served as a test case for the use of cloud technology for collaborative research, is currently used by researchers across seven countries to collaborate on the huge data sets coming off the MeerKAT telescope.
The IDIA cloud is, however, not big enough to meet the requirements for the strategic science domains of astronomy and bioinformatics.
“As the MeerKAT rolls on, the size of the data coming off it is going to multiply by factors of 100,” says Taylor. “In addition, the data being produced by the genomics revolution in biology is creating data challenges similar in size to the MeerKAT and SKA.”
In 2016 the six partner institutions put in a successful bid to DIRISA through its National Integrated Cyberinfrastructure System (NICIS) to build a data-centric computing system that will provide computing power and data storage for projects in the strategic fields of astronomy and bioinformatics.
The infrastructure has been further expanded by investments from IDIA and H3BioNet, the Pan African Bioinformatics Network for H3Africa.
While the Ilifu facility is effectively made up of different nodes, bought at different times, through different funding, the model is that a researcher using the facility will have access to the full resources on offer, explains Professor Rob Simmonds, interim facility director of Ilifu.
The mixed funding model is, however, relevant to the end-user, because it means allocation to the resources will be done through different committees, based on the flow of funding.
“So 32 percent of the funding that went into the facility this year came from DIRISA,” explains Simmonds, “which means 32 percent of that part of the system is available through the DIRISA part of the allocation committee.”
The value of sharing a facility in this way is that when different users are not using the system, other groups will be able to expand their use.
“Having more nodes and greater storage on one system means that when a specific project needs to perform an intensive burst of work, they can get more done than if the funding was used to buy a system for each project” explains Simmonds.
The shared system offers a number of benefits, says Jasper Horrell, Ilifu project manager. “The fact that Ilifu is operated as a single facility means greater efficiencies in terms of scalability of the system, but also in terms of operation: only one group of supporting staff is needed.”
User allocation is guided by policy and works on an approach called Fairshare, which allows users access to their fair share over a reasonable period of time.
“If you look at a period of a month,” explains Horrell, “a specific project or user with an allocation of 20 percent of the system could expect 20 percent access to the system, but there could be spikes where a project might use 80 percent of the system over a week and no use on other weeks. But other than that month-long period, this use would average out to 20 percent.”
How to get access to Ilifu
The Ilifu facility aims to support researchers in the two strategic science domains, within the six partners. But user-access will be quite wide as collaborators on approved projects will have access to the facilities.
Allocation to Ilifu resources is project-based: this means projects are approved and each project has a certain allocation which individual researchers in that project can bill their tasks against. The billing is in terms of hours used on the facility.
“How this works practically,” explains Horrell, “is that a researcher will log onto the system and run their task against a project that has been approved by the relevant allocation committee.”
For now, users can find out more at ilifu.ac.za and apply for access at docs.ilifu.ac.za
“Ilifu is the beginning of South Africa’s answer to a global big-data problem,” says Taylor. “Through the IDIA cloud, we have proved that the concept of shared data-intensive cloud infrastructure for research works. Through Ilifu, we are scaling this up into a more robust environment, and in the years to come, as data requirements grow, we hope to see Ilifu provide a valuable facility for global collaboration in data-intensive research.”