Offline Computing for SELEX during the run
Offline Computing at SELEX(E781) During Data Taking
Peter S. Cooper
January 12,1995
SELEX is high statistics charmed baryon hadronic production and decay
experiment. It incorporates an online software filter to reduce the number
of charm candidates written to tape by a factor of ~20. This filtering
technique is very powerful in terms of shortening the subsequent analysis
time required to produce Physics results from the experiment. It should
also allow charm signals to be observed while the data is being taken in a
"near-online" analysis mode where data is analyzed in the period of several
days or weeks after it is taken. This capability will allow us to optimize
the charm yield of both the electronic trigger and software filter to improve
the sensitivity of the experiment. In order to make this technique work and to
fully exploit it's potential SELEX will require substantial offline computing
resources during the data taking phase of the experiment.
This note is an attempt to outline the computing model we believe is most
efficient to meet these needs and to identify any missing resources. The basic
idea is to copy 10% of the files written to tape into a hierarchical file store
in the computer center using the network. We want be be able to access those
files for an indefinite period of time from any of the center UNIX systems;
FNALU, CLUBS and the UNIX farms.
The enabling technology at the experiment which permits this strategy
is the adoption by the experiment of a plan to spool all data logged to tape
though disk files. This plan is described in more detail in Reference 1. In
brief, we will spool about 4 hours of data on disk and write data tapes in a
series of file copied from those disks to 8mm tape. A typical run contains 10
- 200 Mb files. It is straightforward to copy the first of those files across
then network into the center after it has been logged to tape at the experiment
and before it becomes the oldest file in the disk cache and is deleted to make
space. This plan is failsafe: if the network is unavailable or fails the file
is already logged to tape. If a particular run is important 'sneakernet' still
works to get the data to the center.
This capability, in effect, puts the computer center resources online
at the experiment. CPU intensive "online" monitoring jobs can be run on
FNALU or CLUBS against a just copied file and the result could be available
while the same run is still in progress. Jobs can escalate in data volume.
An interactive code development at the experiment can be run either as a batch
or interactive job running against one whole file on FNALU. The same job can
latter be submitted to CLUBS to run against a large number of files. If such
a job is sufficiently CPU intensive the it should be able to be run on a farm
system which has access to the same filebase. All of these jobs should
require no user tape mounts.
The major component I believe to be missing in order to permit this plan is
the third level of the NEEDFILE hierarchical store - non-user vault tapes. To
set a scale on this need; E781 expects to write about 3000 5Gb 8mm tapes over
the two calendar year period of the next fixed target run. This is a direct
extrapolation from our MOU to the >4000 hours of beam for data taking we can
expect in the run. E781 would require 300 8mm tape equivalents (1.5 Tb) of
third level cache tape in the center to hold 10% of the E781 raw data. This
amount should be increased by a small factor to leave room for the output data
sets generated by processing these raw data. A factor of 2, for a total of 600
tapes (3 Tb) seems reasonable. Whether the tape technology adopted by the
center to provide this storage is 8mm or something else is a matter of
complete indifference to us. Provided it is reasonably reliable and the cache
hierarchy is sized so that only a small fraction of file requests require a
manual tape mount it can be any technology that is cost effective in the
center environment.
Several modifications of the NEEDFILE system are required in order to
permit the kind of usage I am proposing here. I believe it is not possible,
today, to NEEDFILE data directly to a farm system. The concept of a large file
has to be introduced as a unit of managed data in addition to VSN. File in
this context means something of >20 Mb in size. We are not requesting the
management of an infinite number of infinitesimal files. If we wish to to that
we'll use tar to encapsulate directory structures into files large enough to
be managed by the NEEDFILE system. User commands are required to identify
files to be imported into and deleted from the the store. Some kind of
accounting for space used in the last level of the cache will be required since
throwing away the least recently used file is exactly what we don't want.
Perhaps NEEDFILE is the wrong platform on which to build this system? We will
defer to expert advice on this point.
The need to distribute data to the collaboration can be handled by copying
files from this store to any required medium or network. The post run analysis
algorithm development and generation of the constant database can be done with
the same file base and computing resources. If this system works well we will
only need to actually read our data tape at PASS1 time. The tapes will have to
be sampled to verify that they can be read. Other than this they should be for
PASS1 input, error recovery and dead-storage only.
We have used a model very similar in organization and size to that proposed
above to analyze our last experiment, E761. In that case the computer system
was the Amdahl, the external and internal tape technologies were 9 track and
3480 respectrively. The data volumes were scaled down by the 200Mb/5Gb ratio
implied by the old tape technologies. In E761 we had 4000 raw data tapes and
still have about 100 3480 vault tapes. Our "files" were single 3480 tapes
(VSNs) which held Amdhal disk images (tar files in today's language). We did
not build this system until PASS1 time so there was no input via the network.
This wasn't technically feasible in 1990 in any case. This system was an
extremely effective way to perform an analysis; particularly in our
collaboration where more than half the physicists are from foreign
institutions.
The character is E781 is similar. The quality of today's network is
sufficiently good that we have people working on E781 computing projects in
Russia, Brazil and Mexico among other places. The ability to login in over the
network and submit batch jobs to run on the same computer systems against the
same file base gives a foreign E781 physicist access to the same tools as a
local Fermilab physicist has available. The network connectivity need only be
good enough to get the batch job submitted and to ftp the histograms back home.
This model will only succeed if the access to the data and codes is largely
transparent. A non-local cannot successfully work if there is a continual
negotiation among the graduate students over who gets the tape and who gets the
disk space to stage it. This is the overall problem I am trying to address
with this system. It is an important problem to solve during the analysis phase
of the experiment and is critical during the checkout and data taking phases.
Not every E781 physicist who needs access to the data in order to check that his
part of the experiment is working correctly can be at Fermilab all the time.
References
Data Logging in E781, Peter Cooper,
December 27, 1994