Created 16 March 1998 by kcarlson@arsc.edu, no updates anticipated.
Date: Mon, 16 Mar 1998 15:14:14 -0800 To: technical_services From: Kurt CarlsonSubject: A quick accounting trip report Cc: bastille,horner,hagestead to: TS cc: Derek, Barbara, Gary From my perspective the accounting class was quite good. The instructor (David Wright) also does performance monitoring and tuning work and covered use of the accounting data for performance management as well as resource accounting. Since he's been redirected of late to some IRIX support he was able to toss in some useful information there. The first 2 days of the class covered the setup and configuration of Cray CSA accounting and a detailed overview on the nature of the data. Third day covered customizaion via csagcon and csagfef. Disappointments: --------------- The training facility in Eagan as being closed and most of the UNICOS curriculum is being cancelled.... non-revenue generating. As we've heard before, UNICOS (10.0) is feature-frozen. Instructor sympathized with our (and other's) needs for either detailed structure layouts and definitions (i.e., docs) or source level access for accounting, but he's been losing that internal battle for years. csafef and csafef2 are meaningful only with a source license. The corporate Cray|SGI response for these needs seems to be to let 3rd parties deliver accounting solutions and there is at least one product which does this... including interval accounting which may approximate resman. The package cited, TeamQuest (http://www.teamquest.com) seems to be a company more focused on performance analysis then resource accounting, we'll look into it a little further. Instructor felt a J90 may not be the best choice for an NFS file server... the primary concern being UNICOS's management of file system caching (particularly with fragmentation due to automounts). If this is something which hasn't been discussed we may want to follow up on it further. With the accounting data synchronized with sar it should be possible to determine cache effectiveness. Class dealt with MPP accounting solely as T3D was implemented under UNICOS (vs. UNICOS/mk). However, CSA accounting is implemented under U/mk and much of the information will translate although some of the performance metrics are different. Other News: ---------- IRIX enhanced accounting has some features we may be interested in, it's part of the sat enhanced security. It does include project accounting. CSA accounting will be ported to IRIX (after 6.5). Most of the reporting we desire can be done with csagcon and csagfef. The csagcon consolodation without jobs as criteria does not aggregate timings (such as queue time) so turnaround and expansion factors required by HPC will require two-step generation (back-end program or spreadsheet, likely a program). Our other reporting needs appear to be managable with csagcon|gfef with simply acid+uid aggregation. In all cases we will need to back-end the reporting tools for history|trend analysis and likely project|sub-project hierarchies. In the past this was managed by archiving historical reports and to a degree by resman. Exact form this should take will require further discussion.... could be a backend database or some tools (programs) for consolodating reports. Also of concern is tracking of resource allocation and utilization (resman). Since we still need to do this and not just report usage we'll need to discuss|decide whether we want to try to maintain this within the udb or some back-end database. A back-end database may facilitate trend reporting. As has been disucessed, a live system report will need to be developed as accounting only deals with terminating processes... purpose would be to alert us, at least daily, of long-running jobs which are exceeding resource allocations or are potentional run-a-ways. Some other tidbits: ------------------ Recommended accounting runs be moved to midnight (faciliates some reporting with a cleaner interval... acctcom in particular has some clock rollover restrictions). Recommended accounting and sar intervals be synchronized (facilitates use of accounting data to isolate service and performance problems or bottlenecks). Recommended the use of archive1 vs. archive2 exit of csarun to retain the pre-processed accounting files instead of the session record files since some information is lost in consolidation (csabuild). I personally recommend we enable performance accountning on denali so we have comparison metrics for the J90. I'll try to get to this in the next week... and we'll need to enable on J90 after we enter the user services level of testing. The person re-writing the IRIX accounting documentation stopped by Thursday afternoon solicting opinions on what we'd like to see. I have her email address... I've already commented on explicit documentation on accounting structure contents unless they are going to provide documented conversion routines (API or source). Device level accounting has been retired on UNICOS. CMG (Computer Management Group: X/open, posix, etc. sponsored) has drafted (and approved) specification for performance monitoring: C427 Systems Management: Universal Measurment Architecture (UMA) (ISBN 1-85912-117-7) We can expect to see SGI/Cray abiding by this as it's adopted within the marketplace (assuming it is). Can be found on-line under: http://www.opengroup.org/public/pubs we may want to have a copy ordered for reference purposes. I found a couple techniques where we may be able to incorporate our project heirarchy into csagfef reporting vs. back-ending the sub-project reporting. I need to try these... if these don't pan-out we'll need to handle it with back-end reporting (e.g, programs). Instructor provided a rather extensive tar kit which includes a variety of 'share-ware' customizations of existing csa accounting, including sources to a number of the modules. This included a second kit for IRIX. I'll get these to a common location after I've worked with it some. Unrelated tidbit (there is life after accounting): ---------------- I caught the opening night performance of _Riverdance_ in Minneapolis after the class Thursday. Outstanding.... it will be in Seattle in May (http://www.riverdance.com/), get your tickets early :). k (with input from Dale & Derek)