Created 16 March 1998 by kcarlson@arsc.edu, no updates anticipated.


UNICOS Accounting Class: March 1998 Minneapolis


Date:    Mon, 16 Mar 1998 15:14:14 -0800
To:      technical_services
From:    Kurt Carlson 
Subject: A quick accounting trip report
Cc:      bastille,horner,hagestead

to: TS
cc: Derek, Barbara, Gary

From my perspective the accounting class was quite good.
The instructor (David Wright) also does performance monitoring
and tuning work and covered use of the accounting data for performance
management as well as resource accounting.  Since he's been
redirected of late to some IRIX support he was able to toss
in some useful information there.

The first 2 days of the class covered the setup and configuration
of Cray CSA accounting and a detailed overview on the nature of
the data.  Third day covered customizaion via csagcon and csagfef.


Disappointments:
---------------

The training facility in Eagan as being closed and most of the
UNICOS curriculum is being cancelled.... non-revenue generating.

As we've heard before, UNICOS (10.0) is feature-frozen.

Instructor sympathized with our (and other's) needs for either
detailed structure layouts and definitions (i.e., docs) or source
level access for accounting, but he's been losing that internal battle
for years.  csafef and csafef2 are meaningful only with a source
license.  The corporate Cray|SGI response for these needs seems
to be to let 3rd parties deliver accounting solutions and there
is at least one product which does this... including interval
accounting which may approximate resman.  The package cited,
TeamQuest (http://www.teamquest.com) seems to be a company more
focused on performance analysis then resource accounting, we'll
look into it a little further.

Instructor felt a J90 may not be the best choice for an NFS
file server... the primary concern being UNICOS's management
of file system caching (particularly with fragmentation due
to automounts).  If this is something which hasn't been discussed
we may want to follow up on it further.  With the accounting
data synchronized with sar it should be possible to determine
cache effectiveness.

Class dealt with MPP accounting solely as T3D was implemented under
UNICOS (vs. UNICOS/mk).  However, CSA accounting is implemented
under U/mk and much of the information will translate although some
of the performance metrics are different.


Other News:
----------

IRIX enhanced accounting has some features we may be interested in,
it's part of the sat enhanced security.  It does include project accounting.
CSA accounting will be ported to IRIX (after 6.5).

Most of the reporting we desire can be done with csagcon and csagfef.
The csagcon consolodation without jobs as criteria does not aggregate
timings (such as queue time) so turnaround and expansion factors required
by HPC will require two-step generation (back-end program or spreadsheet,
likely a program).
Our other reporting needs appear to be managable with csagcon|gfef
with simply acid+uid aggregation.
In all cases we will need to back-end the reporting tools for
history|trend analysis and likely project|sub-project hierarchies.
In the past this was managed by archiving historical reports and to
a degree by resman.  Exact form this should take will require further
discussion.... could be a backend database or some tools (programs)
for consolodating reports.

Also of concern is tracking of resource allocation and utilization
(resman).  Since we still need to do this and not just report usage
we'll need to discuss|decide whether we want to try to maintain this
within the udb or some back-end database.  A back-end database may
facilitate trend reporting.

As has been disucessed, a live system report will need to be developed
as accounting only deals with terminating processes... purpose would
be to alert us, at least daily, of long-running jobs which are exceeding
resource allocations or are potentional run-a-ways.


Some other tidbits:
------------------

Recommended accounting runs be moved to midnight (faciliates some
reporting with a cleaner interval... acctcom in particular has
some clock rollover restrictions).

Recommended accounting and sar intervals be synchronized (facilitates
use of accounting data to isolate service and performance problems or
bottlenecks).

Recommended the use of archive1 vs. archive2 exit of csarun to
retain the pre-processed accounting files instead of the
session record files since some information is lost in consolidation
(csabuild).

I personally recommend we enable performance accountning on denali so
we have comparison metrics for the J90.  I'll try to get to this in the
next week... and we'll need to enable on J90 after we enter the user
services level of testing.

The person re-writing the IRIX accounting documentation stopped by
Thursday afternoon solicting opinions on what we'd like to see.
I have her email address...  I've already commented on explicit
documentation on accounting structure contents unless they are
going to provide documented conversion routines (API or source).

Device level accounting has been retired on UNICOS.

CMG (Computer Management Group: X/open, posix, etc. sponsored) has
drafted (and approved) specification for performance monitoring:
C427 Systems Management: Universal Measurment Architecture (UMA)
  (ISBN 1-85912-117-7)
We can expect to see SGI/Cray abiding by this as it's adopted within
the marketplace (assuming it is).  Can be found on-line under:
  http://www.opengroup.org/public/pubs
we may want to have a copy ordered for reference purposes.

I found a couple techniques where we may be able to incorporate
our project heirarchy into csagfef reporting vs. back-ending the
sub-project reporting.  I need to try these... if these don't
pan-out we'll need to handle it with back-end reporting (e.g,
programs).

Instructor provided a rather extensive tar kit which includes
a variety of 'share-ware' customizations of existing csa accounting,
including sources to a number of the modules.  This included
a second kit for IRIX.  I'll get these to a common location after
I've worked with it some.


Unrelated tidbit (there is life after accounting):
----------------
I caught the opening night performance of _Riverdance_ in Minneapolis
after the class Thursday.  Outstanding.... it will be in Seattle in
May (http://www.riverdance.com/), get your tickets early :).

k (with input from Dale & Derek)