"Andrew's blog about Grids, Webs, Security and other interestingTM Stuff"

Grids
WLCG
GridPP
EGEE
OSG

Web/Net
W3C
RFC Editor
Apache

Grid Security
EGEE/LCG JSPG
EU GridPMA
EGEE JRA3
MWSG

Other Security
OpenSSL
IETF PKIX
mod_ssl

News
SlashDot
BBC News
SecurityFocus
The Register
Freshmeat

SlashGrid Reloaded
Thu 15 June 2006 10:06pm

Several years ago I had two Grid security projects: GridSite, which has grown and grown; and SlashGrid, which never made it beyond a demonstrator. I've now resurrected SlashGrid, this time for distributed storage rather than as a secure container filesystem.

The original question went something like this: if each host on the Grid has thousands of potential users, how should Unix processes running on their behalf interact with the local Unix system? Mostly this boils down to file access, since other resources like shared memory segments and network sockets are dynamic and tied to particular (groups of) processes.

The simplest solution, and the first step, was the Pool Accounts system, which just assigned Unix UIDs to users as they arrived, rather like DHCP. I did provide an example recycling script, but since files were still stored directly in the Unix filesystem, all of the files of a particular UID had to be cleared up before recycling.

From the start it was clear that files would need to be shared between users of the same VO. So the Pool Accounts could be divided into subpools corresponding to particular sets of users. The idea was all of the UIDs in each subpool would be assigned to the same Unix group, and this would allow coarse grained access control over shared files. But even though LCG is still using this approach, it was just meant to be a stopgap until proper policy based access control came along, and that's where SlashGrid came in.

The old SlashGrid used the Coda module that was a standard part of the Linux kernel, and this let me connect a daemon running in userspace to the kernel, to provide the virtual filesystem. The main SlashGrid ("/grid") filesystem worked just like a normal Unix filesystem, except that file access and ownership were derived from GACL policies in XML, written in terms of Grid DNs and VOs. All of the files in the backend storage were owned by root and only accessible to root, so nothing was owned by the pool accounts themselves and they could be recycled at will (I could even use the existence of any process owned by a UID as the reservation / locking, rather than the current file-based locking of pool UIDs in the gridmapdir.)

I did also add an HTTP(S) filesystem called curlfs, but this wasn't able to read small sections of large files efficiently, due to limitations of the Coda module.

As the EU DataGrid finished and LCG started, the limitations of the Pool Accounts became more and more obvious, but we've managed to carry on with them all the same, and there wasn't any particular desire to adopt something like SlashGrid. And really the problem wasn't gridifying local filesystems, but the filesystem exported by local storage and controlling their access via DNs, VO membership lists and VOMS attributes. With that picture, it was natural to push ahead with GridSite file storage, since it also uses GACL access control and DNs/DN-Lists/VOMS.

The other thing that's happened is the emergence of "Storage Farms", with mass disk storage on the worker nodes of a farm, mixed in with the CPU power of the farm. Because of the plummeting price of disk, this model has become more and more obvious, but it makes file access harder, because existing solutions like NFS don't scale up to a thousand-node farm where every node is a file server. Running GridSite on each node is straightforward and would happily scale (they're all independent anyway) but then the problem becomes how to present these file servers to the other worker nodes.

So re-enter SlashGrid.

This time I've been able to use the FUSE module, which became mainstream in the 2.6.14 kernel, and which doesn't have the shortcomings of the Coda module. This means that the current SlashGrid implementation can do partial file access via HTTP(S) - in fact, it fetches and caches 4096 byte blocks of files in response to read() calls. But with the emphasis now on local distributed storage, the access control is entirely left up to the GridSite file server.

Last year I also added SiteCast support to GridSite and the htcp commands, which allows replicas of files to be located via UDP multicast. This is squarely aimed at Storage Farms too, and so it's also been natural include this in SlashGrid: so it's now not only possible to access files elsewhere on your storage file via the POSIX filesystem API (open, read, close etc) with a path like /grid/https/grid1.hep.man.ac.uk:488/dir/file.txt, but you don't even need to know where the files are, since /grid/https/sitecast.hep.man.ac.uk:488/dir/file.txt will also work, whichever member of the sitecast.hep.man.ac.uk domain has the file.

There's more about SlashGrid in the GridSite Wiki, and the automated builds of GridSite 1.3.1 onwards include it.

Contact info
Dr Andrew McNab,
Department of Physics
 and Astronomy,
University of Manchester,
Manchester,
United Kingdom,
M13 9PL

Andrew.McNab@cern.ch
Phone: +44-161-306-6474
Fax: +44-161-273-5867

Talks I've given

Recent blogs
- CHEP 2007, Victoria, Canada
- GridPP18 in Glasgow
- GridSite and Subversion
- MWSG at CERN and Escalade
- All Hands Meeting, 2006
- GridSite Storage
- Fort L'Ecluse
- CERN and WLCG
- SlashGrid Reloaded
- AMPPS building site (or "No More Trees, II")

© 2004-6 Andrew McNab <Andrew.McNab@manchester.ac.uk>