1.Reset ILOM和重启Exadata
为了能够顺利的进行升级,升级前最好把整个Exadata重启一次,重启的顺序就是先进入到ILOM 管理界面Reset SP,然后停止CELLS节点的服务,重启所有CELLS,重启成功之后,在重启计算节点。客户的Exadata总共有5个ILOM的管理界面,分别是两台计算节点和三台存储CELLS节点的,需要通过网址访问,因为防火墙的关系,需要找网络管理员开放端口才可以访问。进入到管理界面选择Maintenance,然后选择Reset SP即可。然后要等一会,就可以重新连接上了。ILOM5台的管理地址如下:
gxx2db01-ilom https://10.100.84.118
gxx2db02-ilom https://10.100.84.119
gxx2cel01-ilom https://10.100.84.126
gxx2cel02-ilom https://10.100.84.127
gxx2cel03-ilom https://10.100.84.128
对于存储节点,我们需要先停止掉cells的服务,到每一台cells服务器上运行下列命令:
cellcli -e alter cell shutdown services all
停止成功后检查一下cells的服务是否全部停止成功。
cellcli -e list cell attributes msstatus,cellsrvstatus,rsstatus
重启存储节点的主机
sync
reboot
等到存储节点重启完成之后,检查cells服务是否成功启动,成功启动则没有问题。此时可以重启计算节点,在前面做软件备份的时候停止了数据库和集群软件,如果没有停止,需要先考虑停止数据库,然后再停止集群软件,再进行计算节点的重启。
sync
reboot
2.检查信任关系
为了保证顺畅的升级,需要确保在计算节点能够和存储节点建立安全的信任关系,这里主要是通过SSH来实现的。首先在/tmp下建立一个all_group,配置上两个计算节点和三个存储节点的主机名。然后在建立一个cell_group,配置上三个存储节点的主机名,然后执行下列命令,如果不需要输入密码能够直接显示,则信任关系正常。
[root@gxx2db01 tmp]# dcli -g all_group -l root date
gxx2db01: Sat Sep 6 12:14:41 CST 2014
gxx2db02: Sat Sep 6 12:14:40 CST 2014
gxx2cel01: Sat Sep 6 12:14:41 CST 2014
gxx2cel02: Sat Sep 6 12:14:41 CST 2014
gxx2cel03: Sat Sep 6 12:14:41 CST 2014
[root@gxx2db01 tmp]# dcli -g cell_group -l root 'hostname -i'
gxx2cel01: 10.100.84.104
gxx2cel02: 10.100.84.105
gxx2cel03: 10.100.84.106
如果信任关系有问题,需要使用下列命令,重建信任关系。
ssh-keygen -t rsa
dcli -g cell_group -l root –k
3.升级 LSI Disk Array Controller Firmware
安装LSI DISK Disk Array Controller Firmware可以使用滚动模式和非滚动模式,因为我们申请了停机的时间,所以这个操作使用的是非滚动模式。
1)把安装介质FW12120140.zip 上传到每个cells节点的/tmp目录下.
2)解压FW12120140.zip文件.
[root@gxx2db01 tmp]# unzip FW12120140.zip -d /tmp
[root@gxx2db01 tmp]# mkdir -p /tmp/firmware
[root@gxx2db01 tmp]# tar -pjxf FW12120140.tbz -C /tmp/firmware
在/tmp/fireware下面应该存在一个这样的文件
12.12.0.0140_AF2108_FW_Image.rom 5ff5650dd92acd4e62530bf72aa9ea83
3)验证FW12120140.sh脚本
#!/bin/ksh
echo date > /tmp/manual_fw_update.log
logfile=/tmp/manual_fw_update.log
HWModel=`dmidecode --string system-product-name | tail -1 | sed -e 's/[ \t]\+$//g;s/ /_/g'`
silicon_ver_lsi_card="`lspci 2>/dev/null | grep 'RAID' | grep LSI | awk '{print $NF}' | sed -e 's/03)/B2/g;s/05)/B4/g;'`"
silicon_ver_lsi_card=`echo $silicon_ver_lsi_card | sed -e 's/B2/B4/g'`
lsi_card_firmware_file="SUNDiskControllerFirmware_${silicon_ver_lsi_card}"
echo $lsi_card_firmware_file
echo "`date '+%F %T'`: Now updating the disk controller firmware ..." | tee -a $logfile
echo "`date '+%F %T'`: Now disabling cache of the disk controller ..." | tee -a $logfile
sync
/opt/MegaRAID/MegaCli/MegaCli64 -AdpCacheFlush -aALL -NoLog | tee -a $logfile
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp WT -Lall -a0 -NoLog | tee -a $logfile
/opt/MegaRAID/MegaCli/MegaCli64 -AdpCacheFlush -aALL -NoLog | tee -a $logfile
/opt/MegaRAID/MegaCli/MegaCli64 -v | tee -a $logfile
/opt/MegaRAID/MegaCli/MegaCli64 -AdpFwFlash -f /tmp/firmware/12.12.0.0140_AF2108_FW_Image.rom -NoVerChk -a0 -Silent -AppLogFile /tmp/manual_fw_update.log
if [ $? -ne 0 ]; then
echo "`date '+%F %T'`: [ERROR] Failed to update the Disk Controller firmware. Will continue anyway ..." | tee -a $logfile
else
echo "`date '+%F %T'`: [INFO] Disk controller firmware update command completed successfully." | tee -a $logfile
fi
给脚本赋予700的权限。
chmod 700 /tmp/FW12120140.sh
4)停止数据库和CRS
[oracle@gxx2db01 ~]$ srvctl stop instance –i orcl1 –d orcl
[oracle@gxx2db01 ~]$ srvctl stop instance –i orcl2 –d orcl
[oracle@gxx2db01 ~]$ srvctl stop instance –i gxypdb1 –d gxypdb
[oracle@gxx2db01 ~]$ srvctl stop instance –i gxypdb2 –d gxypdb
[oracle@gxx2db01 ~]$ srvctl stop instance –i jjscpd1 –d jjscpd
[oracle@gxx2db01 ~]$ srvctl stop instance –i jjscpd2 –d jjscpd
[root@gxx2db01 ~]# /u01/app/11.2.0.3/grid/bin/crsctl stop crs –f
[root@gxx2db01 ~]# /u01/app/11.2.0.3/grid/bin/crsctl check crs
5)停止所有存储节点的服务
[root@gxx2db01 ~]# dcli -l root -g cell_group "cellcli -e alter cell shutdown services all"
6)创建文件DISABLE_HARDWARE_FIRMWARE_CHECKS
[root@gxx2db01 ~]# #dcli -l root -g cell_group "touch /opt/oracle.cellos/DISABLE_HARDWARE_FIRMWARE_CHECKS"
7)禁用exachkcfg服务
[root@gxx2db01 ~]# #dcli -l root -g cell_group "chkconfig exachkcfg off"
8)在cells节点上执行FW12120140.sh脚本
[root@gxx2cel01 tmp]# /tmp/FW12120140.sh
SUNDiskControllerFirmware_B4
2014-09-06 11:15:31: Now updating the disk controller firmware ...
2014-09-06 11:15:31: Now disabling cache of the disk controller ...
Cache Flush is successfully done on adapter 0.
Exit Code: 0x00
Set Write Policy to WriteThrough on Adapter 0, VD 0 (target id: 0) success
Set Write Policy to WriteThrough on Adapter 0, VD 1 (target id: 1) success
Set Write Policy to WriteThrough on Adapter 0, VD 2 (target id: 2) success
Set Write Policy to WriteThrough on Adapter 0, VD 3 (target id: 3) success
Set Write Policy to WriteThrough on Adapter 0, VD 4 (target id: 4) success
Set Write Policy to WriteThrough on Adapter 0, VD 5 (target id: 5) success
Set Write Policy to WriteThrough on Adapter 0, VD 6 (target id: 6) success
Set Write Policy to WriteThrough on Adapter 0, VD 7 (target id: 7) success
Set Write Policy to WriteThrough on Adapter 0, VD 8 (target id: 8) success
Set Write Policy to WriteThrough on Adapter 0, VD 9 (target id: 9) success
Set Write Policy to WriteThrough on Adapter 0, VD 10 (target id: 10) success
Set Write Policy to WriteThrough on Adapter 0, VD 11 (target id: 11) success
Exit Code: 0x00
Cache Flush is successfully done on adapter 0.
Exit Code: 0x00
MegaCLI SAS RAID Management Tool Ver 8.02.21 Oct 21, 2011
(c)Copyright 2011, LSI Corporation, All Rights Reserved.
Exit Code: 0x00
95%
Completed2014-09-06 11:16:09: [INFO] Disk controller firmware update command completed successfully
.
9)脚本执行成功之后,需要重启,这里需要注意的一点是,需要重启两次。
[root@gxx2cel01 tmp]#sync
[root@gxx2cel01 tmp]#shutdown -fr now
10)重启完成之后,可以检查LSI MegaRaid Disk Controller Firmware的版本。
[root@gxx2cel01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -a0 -NoLog | grep 'FW
Package Build'
FW Package Build: 12.12.0-0079
FW Version : 2.120.203-1440
Current Size of FW Cache : 399 MB
11)升级成功之后,移除文件DISABLE_HARDWARE_FIRMWARE_CHECKS
[root@gxx2cel01 ~]# dcli -l root -g cell_group "rm -fr /opt/oracle.cellos/DISABLE_HARDWARE_FIRMWARE_CHECKS"
12)开启exachkcfg服务
[root@gxx2cel01 ~]# dcli -l root -g cell_group "chkconfig exachkcfg on"
13)查看cells服务状态
[root@gxx2cel01 ~]# dcli -l root -g cell_group "cellcli -e list cell attributes msstatus,cellsrvstatus,rsstatus"
running running running
从第5步,开始重复上面的步骤在其他存储节点上运行。等所有节点都完成之后,并且验证是有效的LSI MegaRaid Disk Controller Firmware,重启整个存储节点的服务。
上一篇:ORACLE EXADATA升级—从11.2.3.1.0到11.2.3.3.0–(2)备份环境
下一篇:ORACLE EXADATA升级—从11.2.3.1.0到11.2.3.3.0–(4)升级存储节点