联系:QQ(5163721)
标题:Exadata上的常用工具介绍(Troubleshooting Tools)
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
| Utility Path | Usage/Comments | 
| Infiniband | Some of these tools may be found in /opt/oracle.SupportTools/ibdiagtools on cells or database servers. Also see the Infiniband Triage wiki page. | 
| /opt/oracle.SupportTools/ibdiagtools/infinicheck | |
| /opt/oracle.SupportTools/ibdiagtools/verify-topology | |
| ibqueryerrors | |
| /usr/bin/ibdiagnet | Detecting fabric issues | 
| /usr/sbin/ibaddr | Examining HCA state & guids | 
| /usr/sbin/ibcheckerrors | Detecting fabric issues | 
| /usr/sbin/ibcheckerrs | Detecting fabric issues | 
| /usr/sbin/ibcheckstate | Detecting fabric issues | 
| /usr/sbin/ibcheckwidth | Detecting fabric issues | 
| /usr/sbin/ibclearcounters | Reset counters when detecting fabric issues | 
| /usr/sbin/ibclearerrors | Reset counters when detecting fabric issues | 
| /usr/sbin/ibdatacounters | Not directly used. perfquery is used instead | 
| /usr/sbin/ibdatacounts | Not directly used. perfquery is used instead | 
| /usr/sbin/ibhosts | Lising cells/db nodes | 
| /usr/sbin/iblinkinfo.pl | Obtaining the fabric topology | 
| /usr/sbin/ibnetdiscover | Obtaining the fabric topology | 
| /usr/sbin/ibnodes | Lising cells/db nodes/switches | 
| /usr/sbin/ibping | Checking IB level connectivity | 
| /usr/sbin/ibportstate | Testing port failure/disabling bad links | 
| /usr/sbin/ibqueryerrors.pl | Detecting fabric issues | 
| /usr/sbin/ibstat | Examining HCA state & guids | 
| /usr/sbin/ibstatus | Examining HCA state & guids | 
| /usr/sbin/ibswitches | Listing IB switch names | 
| /usr/sbin/ibtracert | Examining IB routes | 
| /usr/sbin/perfquery | Computing throughput, detecting fabric errors | 
| /usr/sbin/saquery | Not directly used | 
| /usr/sbin/set_nodedesc.sh | Setting the HCA node description based on node type | 
| /usr/sbin/sminfo | Determing location of master SM | 
| /usr/sbin/smpdump | not directly used | 
| /usr/sbin/smpquery | not directly used | 
| /usr/sbin/vendstat | not directly used | 
| /usr/bin/ibv_devices | listing local HCAs | 
| /usr/bin/ibv_devinfo | listing details of local HCAs | 
| /usr/bin/ibv_rc_pingpong | Determining working status of HCA | 
| /usr/bin/ibv_srq_pingpong | Determining working status of HCA | 
| /usr/bin/ibv_uc_pingpong | Determining working status of HCA | 
| /usr/bin/ibv_ud_pingpong | Determining working status of HCA | 
| /usr/bin/mstflint | Burning new HCA firmware/obtaining current firmware version | 
| /usr/bin/ib_rdma_bw | Computing IB level stats for troubleshooting | 
| /usr/bin/ib_rdma_lat | Computing IB level stats for troubleshooting | 
| /usr/bin/ib_read_bw | Computing IB level stats for troubleshooting | 
| /usr/bin/ib_read_lat | Computing IB level stats for troubleshooting | 
| /usr/bin/ib_send_bw | Computing IB level stats for troubleshooting | 
| /usr/bin/ib_send_lat | Computing IB level stats for troubleshooting | 
| /usr/bin/ib_write_bw | Computing IB level stats for troubleshooting | 
| /usr/bin/ib_write_lat | Computing IB level stats for troubleshooting | 
| /usr/bin/qperf | Computing throughput for RDS/TCP/SDP protocols | 
| /sbin/ifconfig | Determining configuration/status of network interfaces | 
| /usr/bin/ib-bond | Determining active slave interface for bond0 | 
| /usr/bin/rds-gen | Not directly used | 
| /usr/bin/rds-info | Examining RDS state | 
| /usr/bin/rds-ping | Determining RDS connectivity | 
| /usr/bin/rds-sink | Not directly used | 
| /usr/bin/rds-stress | Profiling RDS performance | 
| Imaging and versions | These tools are related to imaging status and info as well as versions installed | 
| imagehistory | |
| imageinfo | Only on database servers version >= 11.2.1.3 | 
| /opt/oracle.cellos/CheckHWnFWProfile | Only applicable on cells. With the -d option, it will display versions found. Without options, it will report any mismatches against known correct vaiues. | 
| /opt/oracle.SupportTools/CheckSWProfile.sh | Only applicable on cells. Without options, displays any mismatch against known good configurations. | 
| collectlogs.sh | for collecting logs from onecommand deployments | 
| Networking | |
| cat /proc/net/bonding/bond* | |
| cat /sys/class/net/eth?/operstate | |
| cat /sys/class/net/bond*/operstate | |
| ifconfig | |
| ethtool <interface_name> | reports information about the interface like link mode capabilities | 
| Logfiles on both database server and cells | |
| /var/log/messages | Older versions of this file will be automatically renamed as messages.<number> with number 1 being the most recent history. | 
| dmesg (a command that displays log) | |
| /var/log/cellos/validations.log | |
| /var/log/cellos/validations/*log | |
| Logfiles on cells | |
| $ADR_BASE/diag/asm/cell/<hostname>/trace/alert.log | Cell’s alert log. Also will find cell’s trace files in the same directory as the alert.log | 
| Logfiles on database servers | |
| $ORACLE_BASE/diag/asm/+asm/<instname>/trace/alert_<instname>.log | ASM alert logfile | 
| $ORACLE_BASE/diag/rdbms/<dbname>/<instname>/trace/alert_<instname>.log | DB alert log – one for each database running…may be more than one DB | 
| /u01/app/11.2.0/grid/log/<hostname>/alert<hostname>.log | Grid Infrastructure alert logfile. This log is relatively high-level and will often lead you to one of the logs mentioned in the entry just below this one. | 
| /u01/app/11.2.0/grid/log/<hostname>/[cssd,crsd,diskmon]/*.log | Logfiles for CSSD, CRSD, and diskmon processes. These processes are the most likely ones to have issues and will expose most issues. | 
| Infiniband Switches | These commands may be run on IB switches | 
| sminfo | shows the current subnet master switch in the fabric – there should be exactly one regardless of how many switches are present in the fabric | 
| ibswitches | lists all IB switches in the fabric | 
| showunhealthy | shows any unhealthy sensors | 
| env_test | lists all the data from the environmental sensors in the switch | 
| nm2version | shows the current versions – use this to determine what version the switch is running right now | 
| getfanspeed | shows the speed of the internal fans in the switch – can be useful if showunhealthy indicates a problem with one of the fans | 
| Cell software commands (cellcli and friends) | These commands may be run from within cellcli | 
| list cell detail | |
| list alerthistory | |
| list celldisk detail | |
| list griddisk detail | |
| list lun detail | |
| list physicaldisk detail | |
| list flashcache detail | |
| list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome | |
| alter cell validate configuration | |
| adrci | show incident | 
| mdadm –misc –detail /dev/md* | for an overview of the state of the raid devices on the storage cell | 
| cat /proc/mdstat | for a view of the status of the devices | 
| /usr/local/bin/ipconf –verify | |
| mdadm -Q –detail /dev/md? | state information on a particular meta device | 
| <GRID_HOME>/bin/kfod disks=all | lists disks available from DB node for ASM use (run on DB node) | 
| Hardware | These commands may be run to query hardware status. Unless otherwise noted, they apply to cells and database servers. | 
| ipmitool sel list | Lists the system event logs – these logs sometimes show HW events that aren’t seen elsewhere. | 
| ipmitool sunoem cli ‘show /SYS’ | Shows system serial number, fault_state (overall fault state, not necessarily a rollup – may be a fault on a component-level) | 
| /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 | All adapter info | 
| /opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog -dsply -a0 | Diplay controller’s log | 
| /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -a0 | Get battery status | 
| /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -a0 | Get battery properties | 
| /opt/MegaRAID/MegaCli/MegaCli64 -LDinfo -Lall -aALL | Looking for WriteThrough? on the Current Cache Policy – if disabled, may affect performance; easier to get this information from cellcli -e list lun attributes name,lunWriteCacheMode,status | 
| /opt/MegaRAID/MegaCli/MegaCli64 -LDPdInfo -aAll | Helpful to investigate predictive failure if necessary | 
| /opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0 | The Inquiry Data will contain the drive firmware, but decoding the string to get the firmware requires special instructions – beyond what is here. Check list physicaldisk attributes physicalFirmware in cellcli for drive FW version. | 
| lspci [-v [ -v [ -v ]]] | Listing PCI devices. The more -v arguments you add, the more information detail it provides | 
| lsscsi | Especially helpful on cells. Flash cards will show up as MARVELL devices. There should be 16 flash devices listed. If not, there’s a card missing or not visible to the OS. | 
| /opt/oracle.cellos/scripts_aura.sh | This script lists the flash disks as will be seen from the cell software | 
| /opt/oracle.SupportTools/sundiag.sh | Gathers many diagnostic command outputs and important logfiles for analysis of storage cell and disk issues | 
 
								 
 
 
 
 
