[oracle@node1 crsd]$ crs_stat -t CRS-0184: Cannot communicate with the CRS daemon. [oracle@node1 crsd]$ crsctl check crs Failure 1 contacting CSS daemon Cannot communicate with CRS Cannot communicate with EVM [root@node1 crs]# ps -ef|grep
[oracle@node1 crsd]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[oracle@node1 crsd]$ crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
[root@node1 crs]# ps -ef|grep crs
root 3926 1 0 17:46 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 29408 25855 0 22:09 pts/1 00:00:00 grep crs
[root@node1 bin]# ./racgvip
There is no VIP name
[root@node1 crsd]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Device or resource busy] [16]
Shutdown has begun. The daemons should exit soon.
[root@node1 crsd]# raw -qa
/dev/raw/raw1: bound to major 8, minor 17
/dev/raw/raw2: bound to major 8, minor 33
[root@node1 crsd]# ls -al /dev/raw/raw2
crw-rw---- 1 oracle dba 162, 2 9月 15 17:45 /dev/raw/raw2
[root@node1 bin]# ./crsctl query css votedisk
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Device or resource busy] [16]
[root@node1 bin]# ./ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
[root@node1 ~]# ll /etc/oracle/ocr.loc
-rw-r--r-- 1 root oinstall 45 2012-01-17 /etc/oracle/ocr.loc
[root@node1 bin]# more /etc/oracle/ocr.loc
ocrconfig_loc=/dev/raw/raw1
local_only=FALSE
[root@node1 ~]# dd if=/dev/raw/raw1 of=/opt/oracle/ocr_raw.bak
dd: 打開 ‘/dev/raw/raw1’: 設(shè)備或資源忙
lsof|grep /dev/raw/raw1
沒人占用
想把RAW1對應(yīng)的分區(qū)格式化掉. 格式化中發(fā)現(xiàn)SDB1居然是10.7GB 不是裸設(shè)備100M
由于系統(tǒng)管理員過來幫忙,
FDISK SDB 后導致啟動文件系統(tǒng)出了問題.因此在啟動輸入root用戶密碼后可以重新fdisk sdb
并把sdb 10.7GB分區(qū)為sdb1 把裸設(shè)備分區(qū)為sdc1 然后mkfs.ext3 /dev/sdb1 格式化.
這樣就進入了系統(tǒng).并且修改 /etc/sysconfig/rawdevices的 符合連接
再度重啟后發(fā)現(xiàn) DD 可以備份/DEV/RAW/RAW1的內(nèi)容 不再報錯誤了
[root@node1 tmp]# dd if=/dev/zero of=/dev/raw/raw1 bs=512 count=2048
讀入了 2048+0 個塊
輸出了 2048+0 個塊
[root@node1 tmp]# dd if=/dev/zero of=/dev/raw/raw2 bs=512 count=2048
讀入了 2048+0 個塊
輸出了 2048+0 個塊
裸設(shè)備正常使用中…
/tmp 沒有產(chǎn)生新錯誤
停掉CRS
[root@node1 ~]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage
Shutdown has begun. The daemons should exit soon.
執(zhí)行OCR恢復
ocrconfig -restore /opt/oracle/crshome/product/10.2.0/db_1/cdata/crs/backup00.ocr
沒反應(yīng)
去看OCR日志
Cd /opt/oracle/crshome/product/10.2.0/db_1/log/node1/client
[root@node1 client]# cat ocrconfig_6090.log
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2012-09-19 10:51:08.056: [ OCRCONF][3086915264]ocrconfig starts...
2012-09-19 10:51:08.109: [ OCROSD][3086915264]utopen:12:Not enough space in the backing store
2012-09-19 10:51:08.109: [ OCROSD][3086915264]utopen:10:None of the OCR devices are usable
2012-09-19 10:51:08.109: [ OCRRAW][3086915264]phy_rec:1:could not open OCR device
2012-09-19 10:51:08.109: [ OCRCONF][3086915264]Failed to restore OCR from [/opt/oracle/crshome/product/10.2.0/db_1/cdata/crs/backup00.ocr]
2012-09-19 10:51:08.109: [ OCRCONF][3086915264]Exiting [status=failed]...
估計是權(quán)限問題
[root@node1 client]# ll /dev/raw/raw*
crw-rw---- 1 root disk 162, 1 9月 18 18:41 /dev/raw/raw1
crw-rw---- 1 root disk 162, 2 9月 18 18:41 /dev/raw/raw2
是為了避免OCR一直運行沒完 dd無法讀取裸設(shè)備而忙的原因才把權(quán)限修改了
臨時屏蔽CRSD自啟動
[root@node1 opt]# vi /etc/inittab
# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
#h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1
#h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1
#h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1
經(jīng)同事提醒: 分區(qū)還存在問題
Disk /dev/sdc: 107 MB, 107374080 bytes
64 heads, 32 sectors/track, 102 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 102 102 1024 83 Linux
Disk /dev/sdd: 107 MB, 107374080 bytes
64 heads, 32 sectors/track, 102 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 * 1 102 104432 83 Linux
重新分區(qū) fdisk /dev/sdc
重新導入裸文件 過去導出來的raw1.file
重新 ocrconfig -restore /opt/oracle/crshome/product/10.2.0/db_1/cdata/crs/backup00.ocr
沒反應(yīng).氣死也!
重新系統(tǒng) 也沒用….
第二天 想下把節(jié)點2搞搞. 因為節(jié)點2也報同樣的錯誤,那就是增加磁盤到了SCSCI0號總線上導致盤符發(fā)生變化
那么它沒有經(jīng)歷過兩位同事的操刀手.
節(jié)點2啟動了
修改 /etc/sysconfig/rawdevices
[root@node2 ~]# cat /etc/sysconfig/rawdevices
# This file and interface are deprecated.
# Applications needing raw device access should open regular
# block devices with O_DIRECT.
# raw device bindings
# format:
#
# example: /dev/raw/raw1 /dev/sda1
# /dev/raw/raw2 8 5
/dev/raw/raw1 /dev/sdc1
/dev/raw/raw2 /dev/sdd1
[root@node2 ~]# service rawdevices restart
后OCR沒有效,重啟系統(tǒng) 結(jié)果好了
Ocrconfig check crs 三個都OK了
Crs_stat –t 節(jié)點2的都OK 了.
本來想通過節(jié)點2自動恢復OCR盤的內(nèi)容,節(jié)點1的OCR可以讀取正確內(nèi)容而成功啟動.
關(guān)閉了節(jié)點2
Crsctl stop crs 虛擬機比較忙
開啟節(jié)點1 一切照舊,老樣的 OCR不寫日志在/TMP和client目錄下 而CRS日志也沒.
真氣人 難道破壞了OCR的程序,不會吧 把節(jié)點2啟動起來 對文件一一比對.
Ll /dev/raw/raw* 權(quán)限
Cat /etc/sysconfig/rawdevices 盤符.
今天特意帶來大話RAC這本書翻到第6章OCR部分工具 163頁. 看到配置CRS堆棧是否自動啟動
說 crsctl disable crs 命令實際修改下面文件
/etc/oracle/scls_scr/dbp/root/crsstart
注意dbp換成node1
兩個節(jié)點文件對比一看 節(jié)點2 是enable 節(jié)點1是disable
記得同事叫我把節(jié)點1 CRS不自己啟動 這個操作.好吧 把它改成enable 然后重新啟動節(jié)1
PS查看下 不再是 /etc/init.d/init.crsd run 而是一大堆
[root@node1 ~]# ps -ef | grep crs*
root 3392 1 0 15:38 ? 00:00:00 crond
root 3427 1 0 15:38 ? 00:00:00 anacron -s
root 4045 1 0 15:38 ? 00:00:00 /bin/su -l oracle -c sh -c 'ulimit -c unlimited; cd /opt/oracle/crshome/product/10.2.0/db_1/log/node1/evmd; exec /opt/oracle/crshome/product/10.2.0/db_1/bin/evmd '
root 4052 1 1 15:38 ? 00:00:08 /opt/oracle/crshome/product/10.2.0/db_1/bin/crsd.bin reboot
oracle 4773 4045 0 15:39 ? 00:00:01 /opt/oracle/crshome/product/10.2.0/db_1/bin/evmd.bin
root 4890 4752 0 15:39 ? 00:00:00 /bin/su -l oracle -c /bin/sh -c 'ulimit -c unlimited; cd /opt/oracle/crshome/product/10.2.0/db_1/log/node1/cssd;
[root@node1 ~]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[root@node1 ~]# su - oracle
[oracle@node1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....C1.inst application ONLINE ONLINE node1
ora....C2.inst application ONLINE ONLINE node2
ora.MYRAC.db application ONLINE ONLINE node2
ora....SM1.asm application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
總結(jié)
1 增加磁盤時候小心盤符發(fā)生改變
2 分區(qū)命令注意start 和end 創(chuàng)建分區(qū)的時候有提示兩個1的時候
3 OCR程序先在CRS前啟動,OCR不能啟動 CRS也不能啟動
4 兩位同事操刀命令熟,速度快.極容易忽悠掉信息的細節(jié)
5 記住不要采用試錯的方式,修改CRS的設(shè)置.尤其是在問題還沒有精確定位時.
6 任何改動要人工手記在本子,或者word內(nèi).因為不斷地修改和試錯容易造成環(huán)境的破壞.
7 這個BUG折騰了1個周的時間,求教了多人,能起到作用的是兩位要好的同事,提供了有效的幫助.而群里的人提供的是命令和文件,讓自己熟悉了linux 一些命令和文件配置.因此當一個人無法解決的時候,可以洗洗睡睡,或者請教他人.正所謂當局者迷旁觀者清.人久了頭腦會發(fā)昏,視覺疲勞,容易放過重要的信息和提示.
8 還好這是虛擬機,如果是生產(chǎn)系統(tǒng),需要短時間處理問題,在嘈雜,壓力,悶熱下,估計是無法解決問題的.或許在壓力下才用試錯法帶來更多的問題.
聲明:本網(wǎng)頁內(nèi)容旨在傳播知識,若有侵權(quán)等問題請及時與本網(wǎng)聯(lián)系,我們將在第一時間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com