Offline Computing for SELEX during the run

Offline Computing at SELEX(E781) During Data Taking

Peter S. Cooper

January 12,1995

SELEX is high statistics charmed baryon hadronic production and decay experiment. It incorporates an online software filter to reduce the number of charm candidates written to tape by a factor of ~20. This filtering technique is very powerful in terms of shortening the subsequent analysis time required to produce Physics results from the experiment. It should also allow charm signals to be observed while the data is being taken in a "near-online" analysis mode where data is analyzed in the period of several days or weeks after it is taken. This capability will allow us to optimize the charm yield of both the electronic trigger and software filter to improve the sensitivity of the experiment. In order to make this technique work and to fully exploit it's potential SELEX will require substantial offline computing resources during the data taking phase of the experiment.

This note is an attempt to outline the computing model we believe is most efficient to meet these needs and to identify any missing resources. The basic idea is to copy 10% of the files written to tape into a hierarchical file store in the computer center using the network. We want be be able to access those files for an indefinite period of time from any of the center UNIX systems; FNALU, CLUBS and the UNIX farms.

The enabling technology at the experiment which permits this strategy is the adoption by the experiment of a plan to spool all data logged to tape though disk files. This plan is described in more detail in Reference 1. In brief, we will spool about 4 hours of data on disk and write data tapes in a series of file copied from those disks to 8mm tape. A typical run contains 10 - 200 Mb files. It is straightforward to copy the first of those files across then network into the center after it has been logged to tape at the experiment and before it becomes the oldest file in the disk cache and is deleted to make space. This plan is failsafe: if the network is unavailable or fails the file is already logged to tape. If a particular run is important 'sneakernet' still works to get the data to the center.

This capability, in effect, puts the computer center resources online at the experiment. CPU intensive "online" monitoring jobs can be run on FNALU or CLUBS against a just copied file and the result could be available while the same run is still in progress. Jobs can escalate in data volume. An interactive code development at the experiment can be run either as a batch or interactive job running against one whole file on FNALU. The same job can latter be submitted to CLUBS to run against a large number of files. If such a job is sufficiently CPU intensive the it should be able to be run on a farm system which has access to the same filebase. All of these jobs should require no user tape mounts.

The major component I believe to be missing in order to permit this plan is the third level of the NEEDFILE hierarchical store - non-user vault tapes. To set a scale on this need; E781 expects to write about 3000 5Gb 8mm tapes over the two calendar year period of the next fixed target run. This is a direct extrapolation from our MOU to the >4000 hours of beam for data taking we can expect in the run. E781 would require 300 8mm tape equivalents (1.5 Tb) of third level cache tape in the center to hold 10% of the E781 raw data. This amount should be increased by a small factor to leave room for the output data sets generated by processing these raw data. A factor of 2, for a total of 600 tapes (3 Tb) seems reasonable. Whether the tape technology adopted by the center to provide this storage is 8mm or something else is a matter of complete indifference to us. Provided it is reasonably reliable and the cache hierarchy is sized so that only a small fraction of file requests require a manual tape mount it can be any technology that is cost effective in the center environment.

Several modifications of the NEEDFILE system are required in order to permit the kind of usage I am proposing here. I believe it is not possible, today, to NEEDFILE data directly to a farm system. The concept of a large file has to be introduced as a unit of managed data in addition to VSN. File in this context means something of >20 Mb in size. We are not requesting the management of an infinite number of infinitesimal files. If we wish to to that we'll use tar to encapsulate directory structures into files large enough to be managed by the NEEDFILE system. User commands are required to identify files to be imported into and deleted from the the store. Some kind of accounting for space used in the last level of the cache will be required since throwing away the least recently used file is exactly what we don't want. Perhaps NEEDFILE is the wrong platform on which to build this system? We will defer to expert advice on this point.

The need to distribute data to the collaboration can be handled by copying files from this store to any required medium or network. The post run analysis algorithm development and generation of the constant database can be done with the same file base and computing resources. If this system works well we will only need to actually read our data tape at PASS1 time. The tapes will have to be sampled to verify that they can be read. Other than this they should be for PASS1 input, error recovery and dead-storage only.

We have used a model very similar in organization and size to that proposed above to analyze our last experiment, E761. In that case the computer system was the Amdahl, the external and internal tape technologies were 9 track and 3480 respectrively. The data volumes were scaled down by the 200Mb/5Gb ratio implied by the old tape technologies. In E761 we had 4000 raw data tapes and still have about 100 3480 vault tapes. Our "files" were single 3480 tapes (VSNs) which held Amdhal disk images (tar files in today's language). We did not build this system until PASS1 time so there was no input via the network. This wasn't technically feasible in 1990 in any case. This system was an extremely effective way to perform an analysis; particularly in our collaboration where more than half the physicists are from foreign institutions.

The character is E781 is similar. The quality of today's network is sufficiently good that we have people working on E781 computing projects in Russia, Brazil and Mexico among other places. The ability to login in over the network and submit batch jobs to run on the same computer systems against the same file base gives a foreign E781 physicist access to the same tools as a local Fermilab physicist has available. The network connectivity need only be good enough to get the batch job submitted and to ftp the histograms back home. This model will only succeed if the access to the data and codes is largely transparent. A non-local cannot successfully work if there is a continual negotiation among the graduate students over who gets the tape and who gets the disk space to stage it. This is the overall problem I am trying to address with this system. It is an important problem to solve during the analysis phase of the experiment and is critical during the checkout and data taking phases. Not every E781 physicist who needs access to the data in order to check that his part of the experiment is working correctly can be at Fermilab all the time.

References

Data Logging in E781, Peter Cooper, December 27, 1994