Exadata 的4种刷机方法——Reimage

联系:QQ(5163721)

标题:Exadata 的4种刷机方法——Reimage

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

明天又要刷机器了,装机工很久没玩,快忘光了,温习一下,O(∩_∩)O哈哈~

1,刷机前先检查和保留当前系统关键部件的信息,例如:

/opt/oracle.SupportTools/CheckHWnFWProfile -s

/opt/oracle.SupportTools/CheckHWnFWProfile -c loose

/opt/oracle.SupportTools/CheckSWProfile.sh -I dm01sw-ib2,dm01sw-ib3

imageinfo

imagehistory

2,跟NOTES 888828.1的内容,找到相关的image,download后,解压,例如:
unzip ImageMaker.tar.zip
tar -pxvf ImageMaker.tar

DB的image解tar后,可以发现 dl360 目录
CELL的image解tar后,可以发现 dl180 目录
这是因为,Exadata早先跟HP合作推出的V1,用的都是HP的pcserver系列,计算节点的型号是 dl360,存储节点的型号是 dl180,后来也就一直都没有更改了。

我们有四种方式刷机:
1. 用U盘刷机,也就是 USB flash thumb drive
2. 制作ISO image,使用ILOM指定iso的方式(当然如果刻录成光盘,也可以使用DVD模式)
3. 制作一个紧急启动的iso文件(类似于紧急启动盘),然后把image放在NFS上,进行刷机
4. 使用PXE+NFS

上面的4种方法,对于1/4配置来说,哪个都不复杂,用U盘和ISO Image最简单,也最省心。
对于满配或者大量的reimage工作来说,显然U盘就太不可取了,会累死人的,可以使用PXE+NFS和ISO image。

无论哪种方式,制作Reimage的命令都是一个makeImageMedia.sh,语法如下:

./makeImageMedia.sh 
  [-preconf <prconf.csv file full pathname>]
  [
      <dvd iso file name>
    | [-pxe [-pxeout <pxe output filename> ]] 
    | [<nfs iso filename> -nfs nfs_share -dir nfs_dir [-nfs_ip <ip addr for nfs server>]
        [-dhcp] 
        [-logpath <[lognfs_ip:]/full path to writeable nfs share>]
      ]
  ]

Exadata出厂时带有双操作系统,一个是Linux,一个是solaris x86,通常,至少国内的客户绝大部分都会选择使用Linux,因此,在安装完成后,我们需要做reclaim操作。
如果是Reimage,那么我们也可以在制作U盘,image或者使用PXE时带上 -dualboot=no 选项,这样就节省了后面刷机后的reclaim的时间(reclaim我印象中都要1小时以上)

Exadata出厂时带有缺省IP,这些信息在随机附带的document中可以找到。
Exadata官方文档的位置
在文档中,给出了Exadata出厂时的所有缺省IP,你可以看到,IP的分布是对应到物理机器的(当然,X4没有最下面的一个Spine switch了,级联的时候,如果需要可以单独购买)。
比如如果你是1/4 Rack的,那么可以根据每个部件的位置,确定其管理IP或者ILOM IP等等。

刷机时可以附带上新的IP配置文件, 这样就可以刷机后直接使用全套新的你指定的IP。当然如果不带的话,刷完了,机器所有IP就是文档中的缺省IP的配置了。
例如下面的过程制作了一个U盘,我使用了预先配置的preconf.csv文件来制定新的IP:
1,使用U盘启动的方式

[root@dm01db01 dl360]# ./makeImageMedia.sh -preconf /tmp/preconf.csv
Done. Pre config verification OK
Please wait. Calculating md5 checksums for cellbits ...
Please wait. Making initrd ...
199367 blocks
Please wait. Calculating md5 checksums for boot ...

Choose listed USB devices to set up the Oracle CELL installer

sdd   Approximate capacity 15441 MB
Enter the comma separated (no spaces) list of devices or word 'ALL' for to select all: sdd
sdd will be used as the Oracle CELL installer

All data on sdd will be erased. Proceed [y/n]? y

The number of cylinders for this disk is set to 1922.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 1922.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): Command action
   e   extended
   p   primary partition (1-4)
Partition number (1-4): First cylinder (1-1922, default 1): Last cylinder or +size or +sizeM or +sizeK (1-1922, default 1922):
Command (m for help): The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
umount2: Invalid argument
umount: /dev/sdd1: not mounted
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
1929536 inodes, 3857600 blocks
192880 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=3951034368
118 block groups
32768 blocks per group, 32768 fragments per group
16352 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
Copying files... will take several minutes


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> Done creation of installation USB for DL360
[root@dm01db01 dl360]#

之后使用这个USB启动服务器,启动后确认重新两次(它会提示你的,怕你误操作,O(∩_∩)O哈哈~),之后就没你什么事情了,他自己就work了……
之间会重启几次,最后一次,会提示你拔出U盘,然后,重启一下,就ok了。

2,使用ISO image的方式
由于ILOM提供了远程映射ISO文件的重定向功能,因此,我们可以启动ILOM,选择使用CD-ROM image的方式,然后使用ILOM执行reset来重启
如果是X2,那么就可以去喝茶了,如果是X3,那么还要等启动时在Bios中确认使用CDROM方式启动,然后再去喝茶……

如果ISO image的方式,无论是否有 -dualboot=no 选项,重装完的计算节点上都不再是dualboot,可以使用 reclaimdisks.sh -check 进行确认,使用ISO方法如下:
[root@lunar dl360]# ./makeImageMedia.sh -preconf ../preconf_db.csv -stit -notests diskgroup -nodisktests db_img112330.iso

Done. Pre config verification OK
Please wait. Calculating md5 checksums for cellbits ...
Calculating md5 checksum for exaos.tbz ...
Calculating md5 checksum for dbboot.tbz ...
Calculating md5 checksum for dbfw.tbz ...
Calculating md5 checksum for kernel.tbz ...
Calculating md5 checksum for ofed.tbz ...
Calculating md5 checksum for sunutils.tbz ...
Calculating md5 checksum for hputils.tbz ...
Calculating md5 checksum for c7rpms.tbz ...
Calculating md5 checksum for commonos.tbz ...
Calculating md5 checksum for debugos.tbz ...
Calculating md5 checksum for dbrpms.tbz ...
Please wait. Making initrd ...
。。。。。。

使用ISO image的方式,不用拔U盘,O(∩_∩)O哈哈~,他自己重启几次(大概2~3次,忘记了),然后出现“Installation SUCCESSFUL”就ok了。

3,使用ISO image + NFS 方式
这个我没有尝试,但是根据readme,我看大约是制作了iso文件放在子目录中,当然,这个目录是放在NFS上的:
makeImageMedia.sh x.iso -nfs /exports/images -dir dl180/11132
readme上有详细的步骤,具体可以参考readme来完成。

4,使用PXE+NFS的方式
(1)首先要确认TITP功能,如果没有需要安装syslinux
yum install syslinux
(2)制作image
cd dl360/
./makeImageMedia.sh -pxe

检查image文件:

			cd /tftpboot/linux-install/dl360/PXE
			ls -l
					-rw-r--r-- 1 root root 38813575 Aug 19 10:39 initrd-11.2.3.2.1-130109-DL360.img
					-rw-r--r-- 1 root root 1325076480 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar
					-rw-r--r-- 1 root root 69 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar.md5
					-r-xr-xr-x 1 root root 3688864 Aug 19 10:39 vmlinux-11.2.3.2.1-130109-DL360

			cd /tftpboot/linux-install
			ls -l
				drwxrwxr-x 7 root root 4096 Aug 19 10:39 dl360
				-rw-r--r-- 1 root root 38813575 Aug 19 10:39 initrd-11.2.3.2.1-130109-DL360.img
				drwxr-xr-x 2 root root 4096 Aug 16 2012 msgs
				-rw-r--r-- 1 root root 1325076480 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar
				-rw-r--r-- 1 root root 69 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar.md5
				-rw-rw-r-- 1 root root 13100 Jul 25 2011 pxelinux.0
				drwxr-xr-x 2 root root 4096 Aug 19 09:15 pxelinux.cfg
				-r-xr-xr-x 1 root root 3688864 Aug 19 10:39 vmlinux-11.2.3.2.1-130109-DL360

(3)配置NFS Exports,并启动nfs server

				cat /etc/exports
				service nfs restart

(4)安装 TFTP SERVER

				yum install tftp-server
				chkconfig --level 345 tftp on

(5)修改 TFTP 配置文件:

			/tftpboot/linux-install/pxelinux.cfg/default
			注意,其中的 kernel vmlinux-11.2.3.2.1-130109-DL360 就是上面我们生成的

(6)配置DHCP

			yum install dhcp
			mv /etc/dhcpd.conf /etc/dhcpd.orig
			chkconfig --level 345 dhcpd on

检查 /etc/dhcpd.conf,例如:

					option ip-forwarding false; # No IP forwarding
					option mask-supplier false; # Don't respond to ICMP Mask req
					
						subnet 10.187.114.0 netmask 255.255.254.0 {
							option routers 10.187.114.1;
						}
						
						group {
							next-server 10.187.115.250;     ######这个是PXE SERVER
							filename "linux-install/pxelinux.0";
							option root-path "10.187.115.250:/tftpboot/linux-install";
							host exadbmel02 {
							hardware ethernet 00:21:28:A3:27:68;  ######eth0的MAC地址
							fixed-address 10.187.115.225;   ######这个是需要做reimage的节点的eth0
						}
					}

这里确认eth0的信息,也可以通过ILOM的方式: System Information -> Components -> /SYS/MB/NET0
或者ssh到ILOM,执行: show /SYS/MB/NET0

(7)重启一下所有网络相关服务:

		service dhcpd restart
		service xinetd restart
		service iptables stop

这样配置好了PXE,就可以使用PXE+NFS了,后面的过程类似于ISO image了。

上述4中reimage的方法和详细步骤,参见readme:


 Copyright (c) 2009, 2011, Oracle and/or its affiliates. All rights reserved. 


       ================================================================

       Imaging install media options and imaging procedures for Exadata

       ================================================================

DO EVERYTHING AS root USER

   |-------------------------------------------------------------------------|
   | NOTE:  FOR SUN Factory the process involves 2 rounds of PXE+NFS imaging |
   |        Scroll all the way to the end of the document to see the overall |
   |        steps in the process and sample pxe configuration files.         |
   |-------------------------------------------------------------------------|

Creation of the installation USB or ISO:

Download the production ImageMaker.tar.zip files on some machine
with Oracle Enterprise Linux 64bit or RHEL 64 bit that has
  grub 0.97 and has tar with bzip2 support
     - grub --version will show the grub version


As root user extract the ImageMaker.tar.zip file 
  unzip ImageMaker.tar.zip
  tar -pxvf ImageMaker.tar

Cell node image extracts to dl180
DB node image extracts to dl360

The makeImageMedia.sh script inside the above directories is used to
create the actual image installation media.

The installation media can be

1. USB flash thumb drive
2. ISO image that may be used on DVD or as remote virtual media using
   LightsOut remote virtual media capabilities.
3. ISO+NFS - where a small iso file is used to boot the system and the
   imaging payload is hosted on a NFS server.
4. PXE+NFS

     ---------------------------------------------------------------------
     | Run all commands AS root from inside the dl180 or dl360 directory |
     ---------------------------------------------------------------------

./makeImageMedia.sh 
  [-preconf <prconf.csv file full pathname>]
  [
      <dvd iso file name>
    | [-pxe [-pxeout <pxe output filename> ]] 
    | [<nfs iso filename> -nfs nfs_share -dir nfs_dir [-nfs_ip <ip addr for nfs server>]
        [-dhcp] 
        [-logpath <[lognfs_ip:]/full path to writeable nfs share>]
      ]
  ]


Install media preparation
-------------------------

  USB thumb drive - also known as the CELLINSTALL USB
  ---------------------------------------------------
  NOTE: It is best to have no other external USB storage
        devices connected to the machine on which you 
        prepare the installer USB s.
 
  Insert empty USB thumb drives of size between 2GB and 32GB, 
  and follow prompts after executing:
 
     ./makeImageMedia.sh

  ISO - That may be burnt on DVD and used for install
  ---------------------------------------------------

     ./makeImageMedia.sh <iso file name>
     Example: ./makeImageMedia cell.iso
 
  NFS+ISO:
  --------
      
     nfs iso is either created on the nfs server itself where the
     nfs export path is nfs_share e.g. /exports/images and nfs_dir is the
     subdirectory of nfs_share where image bits are copied by the iso creation
     command e.g. dl180/11132.
 
     OR
 
     The iso may be created anywhere using the nfsip option to supply the
     ip address of the nfs server and then the contents of nfs_share directory
     must be copied to nfs server at the EXACT same path as nfs_share
 
     NOTE: nfs iso MUST either be built on the nfs server as root user
           OR
           you must supply the ip address for nfs server with -nfsip 
 
     Example:
    
     nfs iso built on the nfs server:
  
       makeImageMedia.sh x.iso -nfs /exports/images -dir dl180/11132
 
       Will create x.iso and copy image bits to /exports/images/dl180/11132
       You can copy the preconf.csv file then to /exports/images/dl180/11132
 
       You can of course embed the preconf.csv file in the iso itself with
 
       makeImageMedia.sh x.iso -nfs /exports/images -dir dl180/11132 \
                               -preconf preconf.csv
     
     nfs iso NOT built on nfs server:
 
       makeImageMedia.sh x.iso -nfsip 123.123.123.123 -nfs /exports/images \
                               -dir dl180/11132
      
       Will create x.iso and copy image bits to /exports/images/dl180/11132
       You can copy the preconf.csv file then to /exports/images/dl180/11132
      
       You must now copy the entire contents of /exports/images/dl180/11132
       onto the real nfs server 123.123.123.123 exactly at path
       /exports/images/dl180/11132 and the nfs server must export
       /exports/images
 
      -logpath option now may be passed to makeImageMedia.sh to extract the logs
       from imaging and zero and first boot validations. See the logpath option
       syntax in PXE support
 
      -dhcp option now may be passed to makeImageMedia.sh to use dhcp to get
       initial ip address during imaging. 

      -multiprof option will create multiprofile images

  Optional command line options for USB / ISO / NFS+ISO installation media
  --------------------------------------------------------------------------

      -factory
         print special [FACTORY_...] messages for use by factory.

      -kerver <kernel_version>
         overwrite default kernel. By defaut installing kernel version depends
         on hardware type (system product name). List of kernels and hardware
         dependencies defined in the "kernel_ver" line at the very top of
         makeImageMedia.sh script.

      -dualboot <yes|no>
         overwtire default dual boot behaviour. It's only applicable for the
         db node. By default dual boot feature (installing Linux image and
         prereserving space for Solaris) depends on hardware type (system
         product name). By default X4170 M2 and X4800 (G5) servers have dual
         boot.
         
      -stit
         signals to force reimage even if installation exists on the machine

      -reboot-on-success
         Do not wait for operator to power off machine on successful image,
         just reboot. Useful for mass unattended imaging using preconf option.
         DO NOT use in factory process.

      -nodisktests
         do not run disktests at zero boot. They take 6-14 hours!

      -notests <group for validations to be skipped, e.g. diskgroup>
         skips all validations with given vldgroup name. For example, if
         diskgroup is given as the group name, then each validation with
         vldgroup set to diskgroup will be skipped. diskgroup today includes
         disktests, calibration and diskhealth. This is one more way to skip
         disktests in addition to the nodisktests option.

  PXE+NFS
  -------
      
      Do everything as root user
      
      0. tar -pxvf the ImageMaker.tar file resulting in dl180 or dl360
         directory. cd to the directory dl180 or dl360
      
      1. You make PXE build using
      
      ./makeImageMedia.sh -pxe [-pxeout <ImageName>]
      
      This will create 3 files in ./PXE directory
      
      kernel - vmlinux-<version>-ImageName-<DL180 or DL360 cell or db respectively>
      initrd - initrd-<version>-ImageName-<DL180 or DL360 cell or db respectively>.img
      image  - nfsimg-<version>-ImageName-<DL180 or DL360 cell or db respectively>.tar
      
      The kernel and initrd files are sent to the node being imaged by the PXE
      server.
      
      The image tar file should be extracted in the nfs_share directory by root
      user as shown in next step.
      
      Example:
      ./makeImageMedia.sh -pxe -pxeout dl180
      PXE nfs image: /dani/11132/dl180/./PXE/nfsimg-11.1.3.2.0-dl180-DL180.tar
      PXE initrd: /dani/11132/dl180/./PXE/initrd-11.1.3.2.0-dl180-DL180.img
      PXE kernel: /dani/11132/dl180/./PXE/vmlinux-11.1.3.2.0-dl180-DL180
      
      2. Prepare nfs share
     
      Copy the nfsimg tar and md5 files to nfs_share directory.

      Example:
      Using nfs_share from the following example, where the nfs server that holds
      the image content is 123.45.67.189 and the nfs_share on it is
      /vol/exadata/dl180, and assuming the image tar file was in /root.
      
      cp /root/nfsimg-11.1.3.2.0-dl180-DL180.* /vol/exadata/dl180

      Releases older than 11.2.1.3.0 should exptract the nfsimg files:
 
        Extract nfsimg-11.1.3.2.0-dl180-DL180.tar to the nfs_share
        "tar -x -p -v -C <nfs_share> -f nfsimg-11.1.3.2.0-dl180-DL180.tar"
      
        Example:
        Using nfs_share from the following example, where the nfs server that holds
        the image content is 123.45.67.189 and the nfs_share on it is
        /vol/exadata/dl180, and assuming the image tar file was in /root.
      
        "tar -x -p -v -C /vol/exadata/dl180 -f /root/nfsimg-11.1.3.2.0-dl180-DL180.tar"
      
      3. Configure the PXE server
      
      Assume PXE server has both DHCP and TFTP daemons started on it.
      
      Copy initrd and kernel into /tftpboot on the PXE server.
      
      Create DHCP configuration in /tftpboot/pxelinux.cfg/<hexadecimal_ip_address_of_the_imaging_machine>.
      You can also use the MAC address for the eth0 NIC for filename instead of
      hexadecimal ip address. 
      
      For ip address 123.123.123.123 the filename based on ip address will be
      7B7B7B7B. 
      If the MAC address is used, and it was 12:34:56:78:90:12, then the file will
      be 01-12-34-56-78-90-12.
      
      Please check your specific PXE server requirements, the above names are what
      were used in our testing and your PXE server may need slightly different names
      or settings.
      
      Examples of configuration file
      
      Example 1
      ---------
      # With dhcp and disktests 
      default linux
      prompt 1
      timeout 72
      label linux
      kernel vmlinux-11.1.3.2.0-dl180-DL180
      append initrd=initrd-11.1.3.2.0-dl180-DL180.img pxe stit updfrm dhcp sk=123.45.67.189:/vol/exadata/dl180 preconf=123.45.67.123:/vol/configs/exadata/allexadataservers.csv 

      Example 2
      ---------
      # With explicit ethX address instead of dhcp and no disktests or
      # calibration
      default linux
      prompt 1
      timeout 72
      label linux
      kernel vmlinux-11.1.3.2.0-dl180-DL180
      append initrd=initrd-11.1.3.2.0-dl180-DL180.img pxe stit updfrm reboot-on-success notests=diskgroup sk=123.45.67.189:/vol/exadata/dl180 eth0=123.123.123.123:255.255.254.0:123.123.123.1 preconf=123.45.67.123:/vol/configs/exadata/allexadataservers.csv
      
      
      Explanation of command line options:
      -----------------------------------
      
      pxe
       - signals this is pxe imaging
         mandatory parameter
      
      factory
       - print special [FACTORY_...] messages for use by factory. 
         optional parameter.
      
      stit
       - signals to force reimage even if installation exists on the machine
         optional parameter
      
      updfrm
       - check hardware and firmware versions. Update firmware where applicable.
         optional parameter.

      kerver=<kernel_version>
         overwrite default kernel. By defaut installing kernel version depends
         on hardware type (system product name). List of kernels and hardware
         dependencies defined in the "kernel_ver" line at the very top of
         makeImageMedia.sh script.

      dualboot=<yes|no>
         overwtire default dual boot behaviour. It's only applicable for the
         db node. By default dual boot feature (installing Linux image and
         prereserving space for Solaris) depends on hardware type (system
         product name). By default X4170 M2 and X4800 (G5) servers have dual
         boot.
      
      dhcp
       - dhcp is optional parameter and should NOT be given with eth0 option
         If given, dhcp is used to obtain the initial dhcp address instead of
         requiring to pass the eth0 information.
    
      ethX=<ip>:<netmask>:<gateway>
       - ethX is optional parameter. Do NOT give with dhcp above.
       - where X is one of 0,1,2,3 on Sun and 0 on HP machines
         If both dchp and ethX are absent imaging will enter
         interactive mode and ask for Ethernet ip, netmask and
         gateway information
      
      sk=<nfsip>:<nfs_share>
       - sk is mandatory parameter for the PXE boot. MUST
         use IP address not the hostname for NFS server.
         nfs_share is the full path to directory where the 3 files from
         PXE directory are available
      
      logpath=[nfsip:]<full path to writeable nfs share>
       - logpath is optional parameter
         If given it will copy the imaging, zero and first boots logs to the
         writable nfs share location in single tar bzip2 file <serial_num>.tbz.
         The serial_num is the serial number of the system obtained as
           dmidecode -s system-serial-number
     
      ----------------------------------------------- 
      Development ONLY options NOT for use in Factory
      -----------------------------------------------
      
      reboot-on-success
       - Do not wait for operator to power off machine on successful image, just
         reboot. Useful for mass unattended imaging using preconf option.
         DO NOT use in factory process.
         optional parameter
      
      multiprof
       - Image the node as multi profile enabled
         This parameter is optional
         NOTE: Do NOT build the images with -multiprof to makeImageMedia.sh
               If you do that image can only be used to do multi profile nodes.
      
      nodisktests
       - do not run disktests at zero boot. They take 6-14 hours!
         This parameter is optional
      
      notests=<group for validations to be skipped, e.g. diskgroup>
       - Skips all validations with given vldgroup name. For example, if
         diskgroup is given as the group name, then each validation with
         vldgroup set to diskgroup will be skipped. diskgroup today includes
         disktests, calibration and diskhealth. This is one more way to skip
         disktests in addition to the nodisktests option.
         This parameter is optional
      
      preconf=[<preconf_nfsip>:]<full path name of preconf_file on nfs server>
       - preconf is optional parameter
         The nfsip MUST be IP address of the NFS server not its hostname
         The preconf_nfsip can be same or different than the nfsip in sk 
         option, allowing the preloaded configuration file to reside on different
         subtree or entirely different nfs server from that of the imaging bits
         on the nfsip nfs server.
      
 
Installation process
--------------------

  Preinstall steps for HP DL180
  -----------------------------

     Imaging will stop and require you to confirm to continue if:
      a) P400 Smart Array disk controller is not in PCIe x8 slot
      b) There are additional USB s besides the CELLINSALL USB and the blank
         USB for use as CELL boot USB
      c) All drives are not identical model and make

      On the target machine set up the BIOS boot sequence such that
      - Hard disk drives is the first in the boot sequence
      - Within the hard disk drives option the USB flash disk(s) are before the P400
        disk controller
      - Disable removable drives
      Use the screen shots in the doc directory as guide.

  Preinstall steps for HP DL360
  -----------------------------

     Imaging will FAIL if following are true:
      a) Infiniband card is not in PCIe x8 slot

      On the target machine set up the BIOS boot sequence such that
      - USB flash is first in the boot sequence
      Use the screen shots in the doc directory as guide.

  Preinstall steps for SUN X4275
  ------------------------------

     Imaging will stop and require you to confirm to continue if:
      a) LSI 9261-8i disk controller is not in PCIe x8 slot
      b) There are additional USB s besides the CELLINSALL USB and the blank
         USB for use as CELL boot USB
      c) All drives are not identical model and make

      On the target machine set up the BIOS boot sequence such that
      - The CELLINSTALL USB is first in boot order
      - The internal CELLBOOT USB (UNIGEN) is the second after the CELLINSTALL
        USB
      - The LSI disk controller is next
      Use the screen shots in the doc directory as guide.

  Preinstall steps for SUN X4170
  ------------------------------
 
    Same as SUN X4275


      - Ignore any messages as stated in the Things to ignore section
        above
      - If BIOS, Disk controller or disk firmware needs update the imaging
        process will update the firmware and try to power cycle the machine
        using ipmi.
      -----
      ALERT: It is possible that the machine may not boot back after such power
      -----  cycle due to issues with BIOS boot order being reset or the ipmi
             power cycle not properly able to complete. Please manually power
             cycle the machine to continue imaging.

      - After imaging and automatic creation of the internal CELL boot USB machine
        will launch several health checks and long disk tests. 
          SAS 600GB drive disk tests will take up to 9 hours
          SATA 2TB drive disk tests will take up to 14 hours
          SAS 450GB drive disk tests will take up to 12 hours
          MDL SAS 1TB (SATA 7200RPM 1TB drives) drive disk tests will take up to 48 hours

      (A) When Success of validation tests
           When all tests pass the machine will indicate the success of installation
           on the console and wait for you to power off the machine.

           In case of a reimage the machine may come to "localhost login:"
           prompt. Login as root/welcome1 and reboot the machine.

      (B) When failure of validation tests
            When a validation fails the machine will prompt you to choose to rerun
            the validations on reboot. You must choose to re-run the tests.
            After you finish making the choice the machine will either present 
            "localhost login:" prompt or exit to a shell. You can logon as root and
            password welcome1 if login prompt is presented. Please
             1. Examine the log files in /var/log/cellos/validations/
                to identify the cause of failure. Correct the problem and
                reboot the machine. The machine will rerun the tests unless
                you had chosen not to re-run them.
             2. If you can not easily identify the cause, please reboot the machine
                to see if the checks pass. If you get prompted for hostname and other
                configuration information, you should poweroff the machine.
     
Install steps
-------------

  Using CELLINSTALL USB:
  ----------------------
     Insert the CELLINSTALL usb in any USB slot on target machine and boot the machine
      - Assuming the machine is bare metal it will automatically boot from the USB and
        the imaging process will start automatically

  Using ISO:
  ---------
     Boot the system using the ISO and follow prompts. For bare metal imaging
     will start automatically

  Using ISO+NFS:
  -------------
     Boot the system with the iso and follow prompts. For bare metal imaging
     will start automatically

  Using PXE+NFS:
  -------------
     Boot the system using PXE by pressing F12 after during BIOS
     initialization splash screens. For bare metal imaging
     will start automatically.

     You can also use "ipmitool chassis bootdev pxe" from already imaged
     system, to force the system to boot one time on next reboot.
     This is useful for mass re-imaging of systems.

      
Unattended first boot configuration:
-----------------------------------
        
    There is now support for unattended first boot configuration as
    long as you build it in the image.
 
    The steps to use unattended first boot configuration need you
    to build the image media with new option to makeImageMedia.sh
 
    See the sample_preconf.csv file for example preconfiguration file.
    This file can be
     (A) passed to the makeImageMedia.sh, and/or
     (B) it can be copied to the nfs_share location when using PXE and/or
     (C) it can be copied to the nfs_share/nfs_dir location if using nfs iso.
     (D) it can be copied to the / directory of the install USB
 
    If the file is inserted in the install media using (A), and if it
    is also passed with methods (B) to (D), then the file from (B) to
    (D) takes precedence over the file passed using (A).
 
    This allows you to update the file after creating the installer media
    - USB or the iso+nfs or the pxe+nfs, so that you can image more
    machines using the same media.
 
    Preparing the preconf.csv file:
    -------------------------------
 
     1. First line with "Cell Preconfig version" is mandatory
    
     2. Title line starting with "Hostname, Domain, ...." is mandatory 
        and format is fixed
    
     3. You MUST NOT change these two lines.
    
     4. The line started with "common" keyword in Hostname column is 
        optional, and provides a way of supplying common values.
  
       4.1. The "common" line MUST not contain these
            a. "eth0 mac address"
            b. "eth0 ip"
            c. "bond0 ip"
            d. "hostname"
 
       4.2. Multiple "common" lines are allowed. Each next common
            line overwrites all previous common settings.
    
     5. Any individual line for the host MUST have unique hostname, eth0 
        mac address, eth0 ip and bond0 ip values.
 
        5.1. If individual column is empty the value from the common
             column is used
        5.2. All values are mandatory except nameservers and NTP servers
        5.3. The full hostname is result of "$HOSTNAME.$DOMAIN"
        5.4. Nameservers and NTP servers have to be separated by space.
 
     6. Any line starting with # is treated as comment line.
 
 NOTE: It's a good practice to validate syntax of .csv file. You can do it using
 
 <dl180 or dl360>/initrd/opt/oracle.cellos/ipconf -verify -preconf <path_to_csv_file>
 
 dl180 or dl360 are top level directories when you extract the ImageMaker
 tar.zip files.
      
--------------------------------------------------------------------------------
SAMPLE of preconf.csv file - See csv files in /opt/oracle.SupportTools/firstconf
--------------------------------------------------------------------------------

Things to ignore safely during install:
--------------------------------------

 HP:
 --
   1) "cciss/cXdYpZ Invalid partition table"
       The above message will repeat several times with X,Y, and Z are some integers
          - X is the P400 smart array disk controller slot number
          - Y is the disk number starting with 0 for the slot
          - Z is the partition number on the disk 
       Reason for this is unknown and the message is harmless

 SUN+HP:
 ------
   1) "RAID1 conf printout:..."
       This is the software RAID printout we are not yet able to find a way to suppress
   2) tar: <file>: <date stamp> is X s in future

Known Issues and work around:
-----------------------------

 1) ONLY on HP DL180: With the install USB and a blank USB in the machine, you may get "Disk
     error. Invalid disk press any key to continue..."
     Please fix the BIOS boot sequence as indicated in various screen shots
     (.gif files in ScreenShots directory)
 2) Installer USB does not work from some USB slots:
     Solution: Try different slots - if all else fails create fresh usb and try if that also
               fails get a new machine.

Miscellaneous tips:
-------------------

NOTE: 1) The disk controller on the Exadata cell nodes  must be in PCIe x8 slot for optimal
         performance (performance can degrade 50% if this is not the case).
      2) The Infiniband card must be in PCIe x8 slot on compute aka database nodes

       You can confirm the speed of the slot as follows: 
       as root run command lspci -vvv on freshly imaged box and check for the Link speed for disk
       controller, the Infiniband cards.
       Look at line marked ===> in the sample outputs of the lspci -vvv command below.
       You should have matching lines for the disk controller and Infiniband
       cards on your machines - If NOT then you will need to open the machine
       and relocate the corresponding cards to correct slots

For DL180 - the Exadata cell nodes:
----------------------------------

03:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 03)
	Subsystem: Hewlett-Packard Company P400 SAS Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 169
	Region 0: Memory at fcc00000 (64-bit, non-prefetchable) [size=1M]
	Region 2: I/O ports at e800 [size=256]
	Region 3: Memory at fcbff000 (64-bit, non-prefetchable) [size=4K]
	Expansion ROM at fcb80000 [disabled] [size=256K]
	Capabilities: [b0] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 512 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s unlimited, L1 unlimited
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
		Device: MaxPayload 128 bytes, MaxReadReq 2048 bytes
		Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 0
		Link: Latency L0s <2us, L1 unlimited
		Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
====>		Link: Speed 2.5Gb/s, Width x8
	Capabilities: [d4] MSI-X: Enable+ Mask- TabSize=4
		Vector table: BAR=0 offset=000fe000
		PBA: BAR=0 offset=000ff000
	Capabilities: [e0] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [ec] Vital Product Data
	Capabilities: [100] Power Budgeting

05:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR] (rev a0)
	Subsystem: Mellanox Technologies MT25418 [ConnectX IB DDR]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 169
	Region 0: Memory at fce00000 (64-bit, non-prefetchable) [size=1M]
	Region 2: Memory at fa800000 (64-bit, prefetchable) [size=8M]
	Region 4: Memory at fcdfe000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] Vital Product Data
	Capabilities: [9c] MSI-X: Enable+ Mask- TabSize=256
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00001000
	Capabilities: [60] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag+
		Device: Latency L0s <64ns, L1 unlimited
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8
		Link: Latency L0s unlimited, L1 unlimited
		Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
====>		Link: Speed 2.5Gb/s, Width x4


For DL360 - the database aka compute nodes:
------------------------------------------

13:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev 20)
	Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 177
	Region 0: Memory at fdf00000 (64-bit, non-prefetchable) [size=1M]
	Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
	Region 4: Memory at d0000000 (64-bit, prefetchable) [size=128M]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] Vital Product Data
	Capabilities: [90] Mescell Signalled Interrupts: 64bit+ Queue=0/5 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [84] MSI-X: Enable- Mask- TabSize=32
		Vector table: BAR=0 offset=00082000
		PBA: BAR=0 offset=00082200
	Capabilities: [60] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s <64ns, L1 unlimited
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		Device: MaxPayload 128 bytes, MaxReadReq 4096 bytes
		Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8
		Link: Latency L0s unlimited, L1 unlimited
		Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
====>		Link: Speed 2.5Gb/s, Width x8

###########################  SUN FACTORY OVERALL PROCESS #################################################

 As of Sep 27, 2009:

  NOTE: ----------------------------
    ALL steps below assume at most single DBM i.e. one full rack.
    If more than one DBM are to be interconnected, then the below scheme needs
    to be modified accordingly to avoid ip address conflicts.
  NOTE: ----------------------------

  The factory process at SUN will make 2 imaging passes using PXE+NFS.
  The first successful imaging pass will leave the systems (X4170 and X4275)
  configured with all the basic network information such as the hostname, ip
  address, etc. Factory will do this using the preconf option to imaging to
  feed the configuration information. The exact mandatory PXE options are
  listed below. Once the nodes are fully up, factory can run various tests
  on them. The mandatory tests are listed below.

  Assuming all tests pass in the first round, the second round of imaging
  will wipe out the first image and re-image the nodes. This round will use
  different options to PXE. The list of options to use is listed below.

  This second round will leave the systems configured with fixed private
  hostnames, ip addresses as oulined in following sections.

  To do this Factory will need to ensure that the MAC addresses for eth0
  interfaces used in the first round are plugged in to the template preconf
  file for second round in exact order as documented in the template file.

  Factory will need to confirm at end of round 2 that the systems can come up
  and carry the private ip addresses and hostnames.

  Note that "pxe" and "sk" options are mandatory for PXE.
  Out of optional parameters:

  Round 1 image PXE options:
  --------------------------
   Must use:
    1. stit
    2. updfrm
    3. factory
    4. reboot-on-success
    5. preconf
   Must NOT use:
    1. notests=diskgroup
    2. multiprof

  Round 2 image PXE options:
  --------------------------
   Must use:
    1. stit
    2. updfrm
    3. factory
    4. notests=diskgroup
    5. preconf - based on second round template
    6. reboot-on-success
   Must NOT use:
    1. multiprof

  Distributed/Rack-wide tests to be run after successful image at Round 1:
  -----------------------------------------------------------------------

  Login to any one node as root/welcome1. Let us call this the master
  test node (MTN).

   0. Examine the validations success
      a. cd /var/log/cellos
      b. Examine the file vldrun.xx.log for any failures. If there are
         failures, then examine for each failed validation the, suggested
         remedy file "<validation name>.SuggestedRemedy" in the validations
         subdirectory.
         If no remedy file exists look for the corresponding log files to
         track down the failure cause and correct it.
      
   1. Check Infiniband Switch software and firmware versions
      a. cd /opt/oracle.SupportTools
      b. ./CheckSWProfile.sh -I <comma separated list of switch ips no spaces>
      Follow prompts

   2. Establish root user ssh equivalence between MTN and any other node:
      a. ssh-keygen -t dsa
      b. ssh-keygen -t rsa
         Accept defaults so the ssh keys are created for root user
      c. create a file called "nodes" listing one hostname (short hostname -
         i.e. hostname -s output) per line for all nodes in the rack.
      d. cd /opt/oracle.SupportTools
      e. ./setup_ssh_eq.sh "full pathname to the nodes file" root welcome1
      This pushes the ssh keys to all nodes establishing the ssh trust for root
      user to all nodes from MTN.
 
   3. Verify the Infiniband connectivity and topology:
      a. cd /opt/oracle.SupportTools/ibdiagtools
      b. ./verify-topology -factory [-t quarterrack]

      Sample output of this for successful runs is in SampleOutputs.txt file in
      same directory for a full rack.

   4. Check that the Infiniband performance is acceptable. Failures indicate
      problems with links, badly seated HCA s, wrong configuration on switch,
      etc.
      a. cd /opt/oracle.SupportTools/ibdiagtools
      b. Create a file of DB nodes (Sun X4170 or HP DL360) one Infiniband IP
         address per line. If there are less than 8 nodes in full DBM and less
         than 4 in half DBM there is some problem in IB connectivity
           ibhosts | awk '/S [0-9.]* / {print $8}' | tee dbips.ora
      c. Create a file of Cell nodes (Sun X4275 or HP DL180) one Infiniband IP
         address per line. If there are less than 8 nodes in full DBM and less
         than 4 in half DBM there is some problem in IB connectivity
           ibhosts | awk '/C [0-9.]* / {print $8}' | tee cellip.ora

      d. Setup all to all root user ssh equivalence
      d.1. Create a file allip.ora with one ip address per node
           ibhosts | awk '/[SC] [0-9.]* HCA\-1/ {print $8}' | tee allip.ora
           Now use the setup_ssh_eq.sh script to setup user equivalence from
           current node to all other nodes without a password
           ../setup_ssh_eq.sh allip.ora root <root_password>
       d.2 Now to setup user equivalence between all other nodes in the rack..
           ./infinicheck -b -g dbips.ora -c cellip.ora -u root -s
         Respond to prompts or you can expect script this part and automate
         it.

      e. Create a smaller file dbip.ora out of dbips.ora with just one ip per
         compute node. So if there are 2 compute nodes with 4 HCAs, 
         then dbip.ora should just have 2 ip address from each node.
          ibhosts | awk '/S [0-9.]* HCA\-1/ {print $8}' | tee dbip.ora

      f. Run the check. Option -b for bare metal will suppress the warnings
         about cellinit.ora and cellip.ora files not found.
           ./infinicheck -b -g dbip.ora -c cellip.ora
      g. To view only performance run results 
           ./infinicheck -d -p
      h. To clean up after a run
           ./infinicheck -z
     
  Details for Round 2:
  -----------------------------------------------------------------------

  1. Before starting reimage for round 2, copy the file somewhere
      /opt/oracle.SupportTools/firstconf/factory_use_only.csv

  2. Edit the copied file to add the MAC addresses for nodes. Pay
     close attention to the order in which addresses are entered.
     The nodes are organized in top to bottom ordering in the rack.

     The existing MAC addresses are sample only and should be written
     over with real MAC addresses.

     Populate these only for the type of Database Machine (DBM) in
     build. For example only fill up the half rack section for half
     rack. Leave the rest alone.

     Verify basic form and content of the file by
      /opt/oracle.cellos/ipconf -preconf <the csv file> -verify

  3. The above edited file should be used as the preconf.csv file
     to reimage the nodes for Round 2.

  4. Once the nodes are up, login to the console (root/welcome1) of
     the bottommost DB node (X4170) in the rack. This should have
     come up with hostname fdata01 or hdata01 or qdata01 or bdata01
     corresponding to full, half or quarter DBM.

  5. cd /opt/oracle.SupportTools

  6. ./setup_ssh_eq.sh \
      /opt/oracle.SupportTools/firstconf/<full | half | quarter> \
      root \
      welcome1

  7. Check that root ssh equivalence was set up correctly in above step
     by simply executing some simple ssh command:
      /usr/local/bin/dcli \
      -g /opt/oracle.SupportTools/firstconf/<full | half | quarter> \
      -l root \
      "hostname -i"

  8. Set the ILOM ip addresses as specified in
     /opt/oracle.SupportTools/firstconf/factory_use_only.csv

  9. Set the NM2 InfiniBand switch ip addresses as specified in
     /opt/oracle.SupportTools/firstconf/factory_use_only.csv

  10. Set the CISCO Ethernet switch ip addresses as specified in
     /opt/oracle.SupportTools/firstconf/factory_use_only.csv

  11. When the above step passed, power off the nodes. They are ready to ship.

  Round 2 variation if Round 2 above does not work and there is time  pressure
  ----------------------------------------------------------------------------

  This is the case where you ship the rack without setting up private IP
  addresses.
 
  Simply reimage the rack but this time use the options:

   Must use:
    1. stit
    2. updfrm
    3. factory
    4. notests=diskgroup
   Must NOT use:
    1. preconf - based on second round template
    2. multiprof
    3. reboot-on-success
 
  What happens at customer site when machine ships with private ips
  -----------------------------------------------------------------

   At customer site, to apply real configuration the customer prepares the
   preconf.csv file that has all the correct content.

   Then,
   0. Log in to the bottommost DB node as root/welcome1
   1. cd /opt/oracle.SupportTools/firstconf
   2. copy the customer preconf
   3. ./applyconfig.sh \
       <full | half | quarter> \
       <full path to preconf.csv file e.g. /root/preconf.csv>
   4. This will push the configuration to all nodes and reboot them.
       
  Solaris installation 
  --------------------------------------------------------------------------

   To install Solaris you have to install or reinstall Linux with dualboot option
   first and make sure that the Linux installation succeed. You should use 
   ForFactorySolaris zip file to deploy PXE server. The README file in the factory
   zip explains the details. 

   You also have the option to use the iso for Solaris installtion. ISO based
   installation is completely unattended, thus you need only boot from the disk
   and wait till the machine reboots after the sucessful installation. It may take
   up to several hours. You can check status of the process:

   0. Log in to the DB node as root/welcome 
   1. tail -f /tmp/install_log 

   
此条目发表在 安装和升级 分类目录,贴了 , , 标签。将固定链接加入收藏夹。

Exadata 的4种刷机方法——Reimage》有 2 条评论

  1. Pingback 引用通告: 看图说话——Exadata的网络架构 – 世间所有相遇都是久别重逢 - Lunar的oracle实验室

  2. Exadata 的4种刷机方法——Reimage – 世间所有相遇都是久别重逢 – Lunar的oracle实验室

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注