Exadata上的常用工具介绍(Troubleshooting Tools)

联系:QQ(5163721)

标题:Exadata上的常用工具介绍(Troubleshooting Tools)

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

Utility Path Usage/Comments
Infiniband Some of these tools may be found in /opt/oracle.SupportTools/ibdiagtools on cells or database servers. Also see the  Infiniband Triage wiki page.
/opt/oracle.SupportTools/ibdiagtools/infinicheck
/opt/oracle.SupportTools/ibdiagtools/verify-topology
ibqueryerrors
/usr/bin/ibdiagnet Detecting fabric issues
/usr/sbin/ibaddr Examining HCA state & guids
/usr/sbin/ibcheckerrors Detecting fabric issues
/usr/sbin/ibcheckerrs Detecting fabric issues
/usr/sbin/ibcheckstate Detecting fabric issues
/usr/sbin/ibcheckwidth Detecting fabric issues
/usr/sbin/ibclearcounters Reset counters when detecting fabric issues
/usr/sbin/ibclearerrors Reset counters when detecting fabric issues
/usr/sbin/ibdatacounters Not directly used. perfquery is used instead
/usr/sbin/ibdatacounts Not directly used. perfquery is used instead
/usr/sbin/ibhosts Lising cells/db nodes
/usr/sbin/iblinkinfo.pl Obtaining the fabric topology
/usr/sbin/ibnetdiscover Obtaining the fabric topology
/usr/sbin/ibnodes Lising cells/db nodes/switches
/usr/sbin/ibping Checking IB level connectivity
/usr/sbin/ibportstate Testing port failure/disabling bad links
/usr/sbin/ibqueryerrors.pl Detecting fabric issues
/usr/sbin/ibstat Examining HCA state & guids
/usr/sbin/ibstatus Examining HCA state & guids
/usr/sbin/ibswitches Listing IB switch names
/usr/sbin/ibtracert Examining IB routes
/usr/sbin/perfquery Computing throughput, detecting fabric errors
/usr/sbin/saquery Not directly used
/usr/sbin/set_nodedesc.sh Setting the HCA node description based on node type
/usr/sbin/sminfo Determing location of master SM
/usr/sbin/smpdump not directly used
/usr/sbin/smpquery not directly used
/usr/sbin/vendstat not directly used
/usr/bin/ibv_devices listing local HCAs
/usr/bin/ibv_devinfo listing details of local HCAs
/usr/bin/ibv_rc_pingpong Determining working status of HCA
/usr/bin/ibv_srq_pingpong Determining working status of HCA
/usr/bin/ibv_uc_pingpong Determining working status of HCA
/usr/bin/ibv_ud_pingpong Determining working status of HCA
/usr/bin/mstflint Burning new HCA firmware/obtaining current firmware version
/usr/bin/ib_rdma_bw Computing IB level stats for troubleshooting
/usr/bin/ib_rdma_lat Computing IB level stats for troubleshooting
/usr/bin/ib_read_bw Computing IB level stats for troubleshooting
/usr/bin/ib_read_lat Computing IB level stats for troubleshooting
/usr/bin/ib_send_bw Computing IB level stats for troubleshooting
/usr/bin/ib_send_lat Computing IB level stats for troubleshooting
/usr/bin/ib_write_bw Computing IB level stats for troubleshooting
/usr/bin/ib_write_lat Computing IB level stats for troubleshooting
/usr/bin/qperf Computing throughput for RDS/TCP/SDP protocols
/sbin/ifconfig Determining configuration/status of network interfaces
/usr/bin/ib-bond Determining active slave interface for bond0
/usr/bin/rds-gen Not directly used
/usr/bin/rds-info Examining RDS state
/usr/bin/rds-ping Determining RDS connectivity
/usr/bin/rds-sink Not directly used
/usr/bin/rds-stress Profiling RDS performance
Imaging and versions These tools are related to imaging status and info as well as versions installed
imagehistory
imageinfo Only on database servers version >= 11.2.1.3
/opt/oracle.cellos/CheckHWnFWProfile Only applicable on cells. With the -d option, it will display versions found. Without options, it will report any mismatches against known correct vaiues.
/opt/oracle.SupportTools/CheckSWProfile.sh Only applicable on cells. Without options, displays any mismatch against known good configurations.
collectlogs.sh for collecting logs from onecommand deployments
Networking  
cat /proc/net/bonding/bond*
cat /sys/class/net/eth?/operstate
cat /sys/class/net/bond*/operstate
ifconfig
ethtool <interface_name> reports information about the interface like link mode capabilities
Logfiles on both database server and cells
/var/log/messages Older versions of this file will be automatically renamed as messages.<number> with number 1 being the most recent history.
dmesg (a command that displays log)
/var/log/cellos/validations.log
/var/log/cellos/validations/*log
Logfiles on cells
$ADR_BASE/diag/asm/cell/<hostname>/trace/alert.log Cell’s alert log. Also will find cell’s trace files in the same directory as the alert.log
Logfiles on database servers
$ORACLE_BASE/diag/asm/+asm/<instname>/trace/alert_<instname>.log ASM alert logfile
$ORACLE_BASE/diag/rdbms/<dbname>/<instname>/trace/alert_<instname>.log DB alert log – one for each database running…may be more than one DB
/u01/app/11.2.0/grid/log/<hostname>/alert<hostname>.log Grid Infrastructure alert logfile. This log is relatively high-level and will often lead you to one of the logs mentioned in the entry just below this one.
/u01/app/11.2.0/grid/log/<hostname>/[cssd,crsd,diskmon]/*.log Logfiles for CSSD, CRSD, and diskmon processes. These processes are the most likely ones to have issues and will expose most issues.
Infiniband Switches These commands may be run on IB switches
sminfo shows the current subnet master switch in the fabric – there should be exactly one regardless of how many switches are present in the fabric
ibswitches lists all IB switches in the fabric
showunhealthy shows any unhealthy sensors
env_test lists all the data from the environmental sensors in the switch
nm2version shows the current versions – use this to determine what version the switch is running right now
getfanspeed shows the speed of the internal fans in the switch – can be useful if showunhealthy indicates a problem with one of the fans
Cell software commands (cellcli and friends) These commands may be run from within cellcli
list cell detail
list alerthistory
list celldisk detail
list griddisk detail
list lun detail
list physicaldisk detail
list flashcache detail
list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
alter cell validate configuration
adrci show incident
mdadm –misc –detail /dev/md* for an overview of the state of the raid devices on the storage cell
cat /proc/mdstat for a view of the status of the devices
/usr/local/bin/ipconf –verify
mdadm -Q –detail /dev/md? state information on a particular meta device
<GRID_HOME>/bin/kfod disks=all lists disks available from DB node for ASM use (run on DB node)
Hardware These commands may be run to query hardware status. Unless otherwise noted, they apply to cells and database servers.
ipmitool sel list Lists the system event logs – these logs sometimes show HW events that aren’t seen elsewhere.
ipmitool sunoem cli ‘show /SYS’ Shows system serial number, fault_state (overall fault state, not necessarily a rollup – may be a fault on a component-level)
/opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 All adapter info
/opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog -dsply -a0 Diplay controller’s log
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -a0 Get battery status
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -a0 Get battery properties
/opt/MegaRAID/MegaCli/MegaCli64 -LDinfo -Lall -aALL Looking for WriteThrough? on the Current Cache Policy – if disabled, may affect performance; easier to get this information from cellcli -e list lun attributes name,lunWriteCacheMode,status
/opt/MegaRAID/MegaCli/MegaCli64 -LDPdInfo -aAll Helpful to investigate predictive failure if necessary
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0 The Inquiry Data will contain the drive firmware, but decoding the string to get the firmware requires special instructions – beyond what is here. Check list physicaldisk attributes physicalFirmware in cellcli for drive FW version.
lspci [-v [ -v [ -v ]]] Listing PCI devices. The more -v arguments you add, the more information detail it provides
lsscsi Especially helpful on cells. Flash cards will show up as MARVELL devices. There should be 16 flash devices listed. If not, there’s a card missing or not visible to the OS.
/opt/oracle.cellos/scripts_aura.sh This script lists the flash disks as will be seen from the cell software
/opt/oracle.SupportTools/sundiag.sh Gathers many diagnostic command outputs and important logfiles for analysis of storage cell and disk issues
此条目发表在 FAQ, 内部机制, 故障诊断, 日常运维 分类目录,贴了 , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注