今天核心系統(tǒng)將一塊磁盤(EMC DMX4)添加到了ASM dg中,然后數(shù)據(jù)庫(kù)RAC兩個(gè)節(jié)點(diǎn)雙雙crash掉了,頓時(shí)嚇了一身冷汗。 檢查日志: NOTE: Disk in mode 0x8 marked for de-assignmentERROR: diskgroup DGIDX1 was not mountedORA-15032: not all alterations perf
今天核心系統(tǒng)將一塊磁盤(EMC DMX4)添加到了ASM dg中,然后數(shù)據(jù)庫(kù)RAC兩個(gè)節(jié)點(diǎn)雙雙crash掉了,頓時(shí)嚇了一身冷汗。
檢查日志:
NOTE: Disk in mode 0x8 marked for de-assignment ERROR: diskgroup DGIDX1 was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "16" is missing from group number "4" ERROR: ALTER DISKGROUP DGIDX1 MOUNT /* asm agent *//* {1:8345:41140} */ Thu Nov 06 15:17:41 2014 Errors in file /oraclelog/grid/diag/asm/+asm/+ASM1/trace/+ASM1_pz99_22545054.trc: ORA-27063: number of bytes read/written is incorrect IBM AIX RISC System/6000 Error: 16: Device busy Additional information: -1 Additional information: 4096 WARNING: Read Failed. group:0 disk:10 AU:0 offset:0 size:4096 Errors in file /oraclelog/grid/diag/asm/+asm/+ASM1/trace/+ASM1_pz99_22545054.trc: ORA-27063: number of bytes read/written is incorrect IBM AIX RISC System/6000 Error: 16: Device busy Additional information: -1 Additional information: 4096 WARNING: Read Failed. group:0 disk:9 AU:0 offset:0 size:4096 Errors in file /oraclelog/grid/diag/asm/+asm/+ASM1/trace/+ASM1_pz99_22545054.trc: ORA-27063: number of bytes read/written is incorrect IBM AIX RISC System/6000 Error: 16: Device bus
新加的盤不能使用,但是此時(shí)兩個(gè)節(jié)點(diǎn)嘗試ASM和數(shù)據(jù)庫(kù)實(shí)例恢復(fù),第二個(gè)節(jié)點(diǎn)卻起了起來(lái),目前問(wèn)題是第一個(gè)節(jié)點(diǎn)的讀取問(wèn)題??赡苁沁@個(gè)LUN對(duì)主機(jī)的存儲(chǔ)鎖、SAN鏈路等問(wèn)題導(dǎo)致了。此時(shí)在第二個(gè)節(jié)點(diǎn)ASM實(shí)例中查看v$asm_operation視圖,結(jié)果為空。看來(lái)這個(gè)盤的rebalance操作已經(jīng)完成了。為了讓這個(gè)生產(chǎn)系統(tǒng)早點(diǎn)上線,我們選擇了把這個(gè)有問(wèn)題的LUN從ASM第二個(gè)實(shí)例中剔除,還原初始環(huán)境。在asmca中操作后,檢查rebalance進(jìn)度:
SQL> select * from v$asm_operation; GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE ------------ ----- ---- ---------- ---------- ---------- ---------- ---------- EST_MINUTES ERROR_CODE ----------- -------------------------------------------- 4 REBAL RUN 1 1 36659 49690 2899 4
一共49G的數(shù)據(jù)需要操作,等待SOFAR=EST_WORK后,該LUN被成功剔除。此時(shí)第一個(gè)節(jié)點(diǎn)的ASM實(shí)例也成功啟動(dòng)。吃一塹長(zhǎng)一智,在數(shù)據(jù)庫(kù)真正使用一個(gè)磁盤之前,檢查設(shè)備的可用性是非常重要的。Oracle的ACS也提到了一個(gè)工具kfod(in $GRID_HOME/bin),可以快速檢查L(zhǎng)UN的有效性,蓋總也簡(jiǎn)單介紹過(guò)該工具:kfod in oracle_asm
# 在第2個(gè)節(jié)點(diǎn),可以找到該磁盤的信息 $ kfod disk=all |grep 113 139: 51930 Mb /dev/rhd113 grid asmadmin # 在第1個(gè)節(jié)點(diǎn),則找不到該磁盤的信息,說(shuō)明Oracle GI無(wú)法正確識(shí)別該LUN。 $ kfod disk=all |grep 113
>o<
原文地址:添加磁盤導(dǎo)致的ASM實(shí)例crash, 感謝原作者分享。
聲明:本網(wǎng)頁(yè)內(nèi)容旨在傳播知識(shí),若有侵權(quán)等問(wèn)題請(qǐng)及時(shí)與本網(wǎng)聯(lián)系,我們將在第一時(shí)間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com
本文如未解决您的问题请添加抖音号:51dongshi(抖音搜索懂视),直接咨询即可。