man ua_uerf





NAME


  ua_uerf - uerf output filter


SYNOPSIS


  uerf [-options] | ua_uerf [-options]


DESCRIPTION


  The Digital UNIX error reporting utility uerf and the newer dia (DECevent)
  produce detailed reports which are impossible for scanning for error trend
  analysis.  The ua_uerf utility is a uerf filter which can summarize errors:

       by type;
       by day;
       by grand total;
       by exclusion;
       by limited volumes;
       by single line summary

  in a form suitable for either ad hoc reporting or for daily email summaries
  or for weekly|monthly management summary.

  The ua_uerf utility is not a replacement for dia (or uerf), but it does
  provide a quick scanning ability to determine when you need to drill deeper
  into hardware errors.


ARGUMENTS


  input-file

       Input file specification, defaults as standard input since
       typically uerf ouptut is piped directly into ua_uerf.


OPTIONS


  Record Type Options

  -all	     Show all record types (default)
  -none	     Show no record types, implies -total
  -boot	     Show boot|oper records
  -hardware  Show hardware records
  -scsi	     Show scsi cam records
  -unix	     Show software records
  -misc	     Show any other record types

  Filtering Options

  nosummary  Do not display summary information totals.

  -total     Show summary totals by day.

  -ignore    string1[,string2...]

       Ignore record types with matching strings.

  -keep	     string1[,string2...]

       Keep record types with matching strings.
       Used to retain records which matched -ignore.

  -limit     N

       Limit the number of replicated records displayed for a day.
       The default is 5, use zero to see only totals.

  Other Options

  -output    output-file

       To specify an output file.

  -verbose   To generate some debugging displays.

  -?	     To display a terse help message.


EXAMPLES


  A 132 column display is strongly recommended for ua_uerf.
  The following aliases are used in the examples:

       sxkac@glacier: alias | grep UERF
       UA7UERF='uerf -c err,oper -o full -t s:`ua_date -uerf -7` | /usr/local/sbin/ua_uerf -a'
       UAXUERF='uerf -c err,oper -o full -t s:`ua_date -uerf -30` | /usr/local/sbin/ua_uerf -a'
       UA_UERF='uerf -c err,oper -o full -t s:`ua_date -uerf -1` | /usr/local/sbin/ua_uerf -a'

  ua_date is a date formating routine with several pre-defined UNIX date for-
  mats and the ability to specify "delta" days, such as "7 days ago" with
  UA7UERF above.  You can, of course, key in a date in uerf format:

       sxkac@nugget: ua_date -uerf -7
       03-sep-1997,00:00:00

  A ua_uerf example for "yesterday" showing only 2 entries for any given type
  and ignoring cdisk_rec_status errors:

       sxkac@glacier: UA_UERF -l2 -i cdisk_rec_status
       #glacier	 Tue Sep  9 1997
       >04:26:41     46	 199 Bus:12 lu:98.0 R=ctape_move_tape:::Unexpected CCB status
       >09:10:22     47	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       *09:10:22     48	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >12:43:07     50	 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
       >18:44:57     52	 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit Attention
       *23:06:52     54	 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit Attention
       >23:11:44     60	 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit Attention
       *23:12:24     62	 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit Attention

       Summary:
	    Total     1	 199 Bus:12 lu:98.0 R=ctape_move_tape:::Unexpected CCB status
	    Total     4	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
	    Total     1	 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
	    Total    16	 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit Attention
	    Total    17	 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit Attention

  Normally, the UA_UERF alias is used "as is" to summarize recent errors when
  investigating a problem... the example above just helps keep down the size
  of this man page.  Note, you must use dia to determine the BBR errors above
  were soft (correctable) errors.

  To find specifically when all the BBR errors occured:

       sxkac@glacier: UA_UERF nosum -i cdisk,ctape -k cdisk_bbr -l20
       #glacier	 Tue Sep  9 1997
       >09:10:22     47	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >09:10:22     48	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >10:41:37     49	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >12:43:07     50	 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
       >23:16:13     81	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150

  A ua_uerf weekly summary:

       sxkac@spike: UA7UERF -l0
       #spike	 Sun Sep  7 1997
       #spike	 Sun Sep  7 1997 07:53:02     19  301 SYSTEM SHUTDOWN	|halted by sxkac:  Apply HSZ v3.1-1,-2 patches
       #spike	 Sun Sep  7 1997 08:20:11      0  300 SYSTEM STARTUP

       Summary:
	    Total     1	 301 SYSTEM SHUTDOWN
	    Total     1	 300 SYSTEM STARTUP
	    Total     1	 199 Bus:03 lu:26.1 R=cdisk_check_sense:::Event - Unit Attention
	    Total     2	 199 Bus:03 lu:26.1 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     2	 199 Bus:03 lu:26.1 R=cdisk_rec_tur_done:::Event - Unit Attention
	    Total     1	 199 Bus:03 lu:26.2 R=cdisk_check_sense:::Event - Unit Attention
	    Total     2	 199 Bus:03 lu:26.2 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     2	 199 Bus:03 lu:26.2 R=cdisk_rec_tur_done:::Event - Unit Attention
	    Total     3	 199 Bus:03 lu:25.0 R=cdisk_op_spin:::Event - Unit Attention
	    Total     3	 199 Bus:02 lu:18.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     3	 199 Bus:02 lu:17.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     1	 199 Bus:02 lu:20.3 R=cdisk_check_sense:::Event - Unit Attention
	    Total     1	 199 Bus:02 lu:20.3 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     1	 199 Bus:02 lu:20.3 R=cdisk_rec_tur_done:::Event - Unit Attention

  A ua_uerf monthly summary for July:

       sxkac@glacier: uerf -f /var/adm/binary.errlog.970501 -o full \
       > -t s:`ua_date -u 7/1` e:`ua_date -u 7/31` |
       > ua_uerf -l0
       #glacier	 Sat Jul  5 1997
       #glacier	 Sat Jul  5 1997 10:28:16      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sat Jul  5 1997 10:28:16      1  300 SYSTEM STARTUP
       #glacier	 Sat Jul  5 1997 13:07:13      2  301 SYSTEM SHUTDOWN	|halted by root:  continue with controller upgrades
       #glacier	 Sat Jul  5 1997 14:03:48      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sat Jul  5 1997 14:03:49      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul  6 1997
       #glacier	 Sun Jul  6 1997 09:57:24      3  301 SYSTEM SHUTDOWN	|halted by root:  install kzpsa*8 and ba660
       #glacier	 Sun Jul  6 1997 12:57:36      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul  6 1997 12:57:36      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul 13 1997
       #glacier	 Sun Jul 13 1997 09:49:24     66  301 SYSTEM SHUTDOWN	|halted by sxkac:  Move ba660 to piu1
       #glacier	 Sun Jul 13 1997 10:38:50      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul 13 1997 10:38:51      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul 20 1997
       #glacier	 Sun Jul 20 1997 10:11:59     29  301 SYSTEM SHUTDOWN	|halted by sxkac:  Apply DU v3.2g-002 patches
       #glacier	 Sun Jul 20 1997 11:09:14      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul 20 1997 11:09:14      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul 20 1997 14:26:45      2  301 SYSTEM SHUTDOWN	|rebooted by root:  Adjust ubc-maxpercent &
       #glacier	 Sun Jul 20 1997 14:31:29      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul 20 1997 14:31:29      1  300 SYSTEM STARTUP
       #glacier	 Wed Jul 30 1997
       #glacier	 Wed Jul 30 1997 10:55:20    247  301 SYSTEM SHUTDOWN	|halted by sxdjd:  potential power outtage
       #glacier	 Wed Jul 30 1997 12:10:13      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Wed Jul 30 1997 12:10:13      1  300 SYSTEM STARTUP

       Summary:
	    Total    18	 199 Bus:05 lu:41.0 R=ctape_iodone:Soft Error Detected (rec:DEC TZ877:+
	    Total     1	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 604763
	    Total     5	 199 Bus:05 lu:41.1 R=changer_check_status:::Recovered error
	    Total     1	 199 Bus:07 lu:57.1 R=cdisk_complete:::Retries Exhausted
	    Total     7	 300 SYSTEM STARTUP
	    Total     6	 301 SYSTEM SHUTDOWN
	    Total    15	 199 Bus:10 lu:81.1 R=changer_check_status:::Recovered error
	    Total    13	 199 Bus:07 lu:57.1 R=cdisk_check_sense:::Event - Unit Attention
	    Total     1	 199 Bus:07 lu:60.2 R=cdisk_check_sense:::Event - Unit Attention
	    Total     1	 199 Bus:07 lu:60.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     8	 199 Bus:07 lu:58.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     1	 199 Bus:07 lu:59.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     1	 199 Bus:07 lu:60.3 R=cdisk_check_sense:::Event - Unit Attention
	    Total     4	 199 Bus:01 lu:10.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     4	 199 Bus:01 lu: 9.1 R=cdisk_check_sense:::Event - Unit Attention
	    Total     2	 199 Bus:03 lu:27.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     2	 199 Bus:03 lu:28.3 R=cdisk_check_sense:::Event - Unit Attention
	    Total     2	 199 Bus:03 lu:28.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total    12	 199 Bus:03 lu:25.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total    12	 199 Bus:03 lu:26.1 R=cdisk_check_sense:::Event - Unit Attention
	    Total     2	 199 Bus:03 lu:28.2 R=cdisk_check_sense:::Event - Unit Attention
	    Total     2	 199 Bus:01 lu: 9.0 R=cdisk_check_sense:::Event - Unit Attention
	    Total     1	 199 Bus:10 lu:81.0 R=ctape_wfm:Soft Error Detected (rec:DEC TZ877:+
	    Total     2	 199 Bus:10 lu:81.1 R=changer_send_ccb:::CCB aborted (timeout), recovering
	    Total     2	 199 Bus:02 lu:16.1 R=changer_send_ccb:::CCB aborted (timeout), recovering
	    Total     1	 199 Bus:02 lu:16.1 R=changer_online:::Device Not Ready
	    Total     1	 199 Bus:03 lu:28.3 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 369753
	    Total     2	 100 CPU EXCEPTION
	    Total     1	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 608417
	    Total   101	 199 Bus:08 lu:67.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total    97	 199 Bus:08 lu:67.0 R=cdisk_rec_tur_done:::Event - Unit Attention
	    Total    19	 199 Bus:08 lu:65.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total    15	 199 Bus:08 lu:65.0 R=cdisk_rec_tur_done:::Event - Unit Attention
	    Total     3	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
	    Total     3	 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
	    Total     3	 199 Bus:08 lu:66.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     3	 199 Bus:08 lu:68.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error

  The monthly summary above filters down 43,731 lines of uerf output.  To
  find when the BBR and CPU errors occurred:

       sxkac@glacier: uerf -f /var/adm/binary.errlog.970501 -o full \
       > -t s:`ua_date -u 7/1` e:`ua_date -u 7/31` > uerf.july
       sxkac@glacier: wc -l uerf.july
	    43731 uerf.july
       sxkac@glacier: ua_uerf uerf.july -hardware nosum \
       > -scsi -k cdisk_bbr -i cdisk,changer,ctape
       #glacier	 Tue Jul  1 1997
       >17:12:48    317	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 604763
       #glacier	 Sat Jul 12 1997
       >10:30:29     65	 199 Bus:03 lu:28.3 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 369753
       #glacier	 Thu Jul 17 1997
       >08:07:35      2	 100 CPU EXCEPTION
       #glacier	 Fri Jul 18 1997
       >08:11:29      3	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 608417
       #glacier	 Thu Jul 24 1997
       >22:54:10     60	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
       >22:54:10     61	 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
       #glacier	 Sat Jul 26 1997
       >01:12:15     78	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
       >01:12:15     79	 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
       >01:16:50     88	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
       >01:16:50     89	 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
       #glacier	 Tue Jul 29 1997
       >08:58:55    224	 100 CPU EXCEPTION

  The example above selects all scsi records, but uses -ignore to filter out
  the noise (changer, ctape, and most cdisk errors).


RESTRICTIONS / NOTES


  ua_uerf has been tested under DU v3.2g and v4.0b.
  As stated above, a 132 column display is recommended.
  Suggestions for enhancements or bug reports can be directed to
  fnkac@uaf.edu.

  The ua_uerf utility uses the cci command parser utilized by non-UNIX
  operating systems instead of the traditional UNIX getopt() parsing.
  Options have generally been defined to "look like" UNIX style options, but
  can be spelled out or abbreviated in many cases.  For example '-l' is the
  same as '-limit'.  In some cases, like 'nosummary', options must be par-
  tially or fully spelled out.	Required option length can be found by
  ua_uerf -?.  Because of this multiple options must be space separated and
  the hyphen is part of the option name.

  From the monthly example above, one must use dia to determine that the two
  "CPU EXCEPTION" errors were single-bit correctable memory errors. Gen-
  erally, that determination should occur on a daily basis.  Sample scripts
  for email'ing summaries and some sed filters for parsing dia BBR and CPU
  errors are included with the ua_uerf distribution. If one uses meaningful
  text for shutdown reason, the binary.errlog files can be used for longer
  term tracking of events. Also included with the distribution is a script to
  "roll" and preserve binary.errlog every four months.


ACKNOWLEDGEMENTS


  The ua_uerf utility was written at the University of Alaska.


RELATED INFORMATION


  Commands: uerf(8), dia(8), ua_date(8).