NetBackup setup over InfiniBand on Exadata:
In order to backup all the data within the Exadata compute nodes, we need to use a dedicated media server which is directly connected to the frame through an infiniband connection.
The storage admins must specify the infiniband IP as the target. However, the master server is not on the infiniband network. We must "trick" the master server to use the proper IP. In the local /etc/hosts file, associate the primary IP on a routeable network to the infiniband hostname (hostname-priv). The master server will try to backup the hostname-priv through the routable network. When the media servers recieves the request, it will lookup the IP associated with the hostname-priv, which will (on the media server) be the infiniband IP.
On the target exadata DB host, you must specify the media server name associated to the infiniband IP in the /etc/hosts file.
Example setup:
-- NetBackup Master Server /etc/hosts
# Aliases needed to trick exadata to backup over infiniband.
10.62.121.100 atl01db01-priv.corp.dwire.com atl01db01-priv
-- Media Server /etc/hosts
### Compute Node Private Interface details farm 1
192.168.10.1 atl01db01-priv.corp.dwire.com atl01db01-priv
-- DB server /etc/hosts
# EXADATA NBU Media Server Infiniband
192.168.16.89 atli1.corp.dwire.com atli1
-- The IPs of the example target server (atl01db01) are:
192.168.10.1 atl01db01-priv.corp.dwire.com atl01db01-priv
10.62.125.100 atl01db01.corp.dwire.com atl01db01
10.62.121.100 atl0101.corp.dwire.com atl0101
-- bp.conf file on DB server
atlcom01db01:
$ cat /opt/openv/netbackup/bp.conf
SERVER = atli1.corp.dwire.com
CLIENT_NAME = atlcom01db01-priv
Thursday, May 31, 2012
Tuesday, May 29, 2012
Configure InfiniBand Listener on Exadata:
Configure InfiniBand Listener on Exadata:
InfiniBand listener can be used for ETL's, RMAN active duplicate procedure etc.. for fast data transfer using IB network.
Make sure following entries are in sqlnet.ora file on all the database servers, we use soft link from GRID home to DB home.
AUTOMATIC_IPC = ON
names.directory_path = (TNSNAMES,EZCONNECT)
names.default_domain = world
name.default_zone = world
sqlnet.expire_time = 15
1.
Add/Identify 8 new IP addresses on the private Infiniband, which are different from existing "-priv" IPs (Same subnet as the IB network) -- Network Admin Task.
Add Private Virtutal IPs to /etc/hosts for all Exadata compute nodes and client (Informatica Servers or any other):
### Compute Node Listener Interface details
192.168.16.xx atl01db01-ibvip.corp.dwire.com atl01db01-ibvip
192.168.16.xx atl01db02-ibvip.corp.dwire.com atl01db02-ibvip
192.168.16.xx atl01db03-ibvip.corp.dwire.com atl01db03-ibvip
192.168.16.xx atl01db04-ibvip.corp.dwire.com atl01db04-ibvip
192.168.16.xx atl01db05-ibvip.corp.dwire.com atl01db05-ibvip
192.168.16.xx atl01db06-ibvip.corp.dwire.com atl01db06-ibvip
192.168.16.xx atl01db07-ibvip.corp.dwire.com atl01db07-ibvip
192.168.16.xx atl01db08-ibvip.corp.dwire.com atl01db08-ibvip
-- Infiniband Sample config:
cat /etc/sysconfig/network-scripts/ifcfg-bondib0
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=bondib0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.16.1
NETMASK=255.255.252.0
NETWORK=192.168.16.0
BROADCAST=192.168.19.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
MTU=65520
2.
As root user, add secondary network to the cluster:
-- Currenly How many networks configured? -- Only one network found
/u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
Output:
$ /u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
ora.net1.network
-- Add the second network
/u01/app/11.2.0.2/grid/bin/srvctl add network -k 2 -S 192.168.16.0/255.255.252.0/bondib0
-- Help 11.2.0.2
> srvctl add network -help
Adds a network configuration to the Oracle Clusterware.
Usage: srvctl add network [-k net_num] -S subnet/netmask/[if1[if2...]] [-w network_type] [-v]
-k net_num network number (default number is 1)
-S subnet/netmask/[if1[if2...]] NET address spec for network
-w network_type The network type (static, dhcp, mixed)
-h Print usage
-v Verbose output
-- Verify newly added network has been added
/u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
Output:
$ /u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
ora.net1.network
ora.net2.network
3.
-- Register newly created Virtual IP's with the CRS as root user:
cd /u01/app/11.2.0.2/grid/bin
srvctl add vip -n atl01db01 -A atl01db01-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db02 -A atl01db02-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db03 -A atl01db03-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db04 -A atl01db04-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db05 -A atl01db05-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db06 -A atl01db06-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db07 -A atl01db07-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db08 -A atl01db08-ibvip/255.255.252.0/bondib0 -k 2
4.
As oracle user:
-- Register newly added listener to CRS to make sure listener will come up automatically after the server restart.
srvctl add listener -l LISTENER_IB -k 2 -p TCP:1522
srvctl config listener
Output:
$ srvctl config listener
Name: LISTENER
Network: 1, Owner: oracle
Home: CRS home
End points: TCP:1521
Name: LISTENER_IB
Network: 2, Owner: oracle
Home: CRS home
End points: TCP:1522
-- Help 11.2.0.2
> srvctl add listener -help
Adds a listener configuration to the Oracle Clusterware.
Usage: srvctl add listener [-l lsnr_name] [-s] [-p "[TCP:]port[, ...][/IPC:key][/NMP:pipe_name][/TCPS:s_port] [/SDP:port]"] [-o oracle_home] [-k net_num]
-l lsnr_name Listener name (default name is LISTENER)
-o oracle_home ORACLE_HOME path (default value is CRS_HOME)
-k net_num network number (default number is 1)
-s Skip the checking of ports
-p "[TCP:]port[, ...][/IPC:key][/NMP:pipe_name][/TCPS:s_port] [/SDP:port]" Comma separated tcp ports or listener endpoints
-h Print usage
5.
Update $ORACLE_HOME/network/admin/tnsnames.ora file on all nodes:
Imp:
Each TNSNAMES.ORA file is different for each server depending on the Local and Remote listener settings.
Local Listener, have information about local node and local IB vip host information.
Remote Listener, have information about other IB vip hosts information except the Local node
Sample entries for node 1 (atl01db01):
Local:
LISTENER_IBLOCAL.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db01-ibvip)(PORT = 1522))
)
)
LISTENER_IPLOCAL.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db01)(PORT = 1521))
)
)
Remote:
LISTENER_IBREMOTE.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db02-ibvip)(PORT = 1522)) -- no node 1 information on node 1!!
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db03-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db04-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db05-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db06-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db07-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db08-ibvip)(PORT = 1522))
)
)
LISTENER_IPREMOTE.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = ATL02-SCAN)(PORT = 1521))
)
)
Imp:
Repeat the same for the other nodes, changing Local and Remote values accordingly.
6 .
-- Update the listeners_network DB parameter on all the databases with the listener addresses from both the networks, i.,e network1 and network2
alter system set listener_networks='((NAME=network2) (LOCAL_LISTENER=LISTENER_IBLOCAL)(REMOTE_LISTENER=LISTENER_IBREMOTE))','((NAME=network1)(LOCAL_LISTENER=LISTENER_IPLOCAL)(REMOTE_LISTENER=LISTENER_IPREMOTE))' scope=both sid='*';
-- Restart the listener and validate the entry.
show parameter listener_networks
-- What do I see with "ps -ef"?
> ps -ef | grep LISTENER
oracle 12874 1 0 May09 ? 00:01:20 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER -inherit
oracle 24335 1 0 May12 ? 00:05:22 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER_IB -inherit
7.
-- Add TNS entry on client side:
Sample TNS entry:
ODS.WORLD =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db01-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db02-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db03-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db04-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db05-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db06-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db07-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db08-ibvip)(PORT = 1522))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = ODS_INFA_SERVICE.world)
)
)
-- How did you create the Service?:
$ srvctl add service -d ODS -s ODS_INFA_SERVICE -r ODS1,ODS2,ODS3,ODS4 -a ODS5,ODS6,ODS7,ODS8 -q TRUE -m BASIC -e SELECT -z 180 -w 5 -j LONG -k2
-- Service configuration:
$ srvctl config service -d ODS -s ODS_INFA_SERVICE
Service name: ODS_INFA_SERVICE
Service is enabled
Server pool: ODS_ODS_INFA_SERVICE
Cardinality: 4
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: true
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 180
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Preferred instances: ODS1,ODS2,ODS3,ODS4
Available instances: ODS5,ODS6,ODS7,ODS8
-- Help 11.2.0.2
> srvctl add service -help
Adds a service configuration to the Oracle Clusterware.
Usage: srvctl add service -d db_unique_name -s service_name {-r "preferred_list" [-a "available_list"] [-P {BASIC | NONE | PRECONNECT}]
-g pool_name [-c {UNIFORM | SINGLETON}] } [-k net_num] [-l [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]]
[-y {AUTOMATIC | MANUAL}] [-q {TRUE | FALSE}]
[-x {TRUE | FALSE}] [-j {SHORT | LONG}]
[-B {NONE | SERVICE_TIME | THROUGHPUT}]
[-e {NONE | SESSION | SELECT}] [-m {NONE | BASIC}]
[-z failover_retries] [-w failover_delay] [-t edition] [-f]
-d db_unique_name Unique name for the database
-s service Service name
-r "preferred_list" Comma separated list of preferred instances
-a "available_list" Comma separated list of available instances
-g pool_name Server pool name
-c {UNIFORM | SINGLETON} Service runs on every active server in the server pool hosting this service (UNIFORM) or just one server (SINGLETON)
-k net_num network number (default number is 1)
-P {NONE | BASIC | PRECONNECT} TAF policy specification
-l role Role of the service (primary, physical_standby, logical_standby, snapshot_standby)
-y policy Management policy for the service (AUTOMATIC or MANUAL)
-e Failover type Failover type (NONE, SESSION, or SELECT)
-m Failover method Failover method (NONE or BASIC)
-w integer Failover delay
-z integer Failover retries
-t edition Edition (or "" for empty edition value)
-j clb_goal Connection Load Balancing Goal (SHORT or LONG). Default is LONG.
-B Runtime Load Balancing Goal Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE)
-x Distributed Transaction Processing Distributed Transaction Processing (TRUE or FALSE)
-q AQ HA notifications AQ HA notifications (TRUE or FALSE)
Usage: srvctl add service -d db_unique_name -s service_name -u {-r "new_pref_inst"
-a "new_avail_inst"} [-f]
-d db_unique_name Unique name for the database
-s service Service name
-u Add a new instance to service configuration
-r new_pref_inst Name of new preferred instance
-a new_avail_inst Name of new available instance
-f Force the add operation even though a listener is not configured for a network
-h Print usage
InfiniBand listener can be used for ETL's, RMAN active duplicate procedure etc.. for fast data transfer using IB network.
Make sure following entries are in sqlnet.ora file on all the database servers, we use soft link from GRID home to DB home.
AUTOMATIC_IPC = ON
names.directory_path = (TNSNAMES,EZCONNECT)
names.default_domain = world
name.default_zone = world
sqlnet.expire_time = 15
1.
Add/Identify 8 new IP addresses on the private Infiniband, which are different from existing "-priv" IPs (Same subnet as the IB network) -- Network Admin Task.
Add Private Virtutal IPs to /etc/hosts for all Exadata compute nodes and client (Informatica Servers or any other):
### Compute Node Listener Interface details
192.168.16.xx atl01db01-ibvip.corp.dwire.com atl01db01-ibvip
192.168.16.xx atl01db02-ibvip.corp.dwire.com atl01db02-ibvip
192.168.16.xx atl01db03-ibvip.corp.dwire.com atl01db03-ibvip
192.168.16.xx atl01db04-ibvip.corp.dwire.com atl01db04-ibvip
192.168.16.xx atl01db05-ibvip.corp.dwire.com atl01db05-ibvip
192.168.16.xx atl01db06-ibvip.corp.dwire.com atl01db06-ibvip
192.168.16.xx atl01db07-ibvip.corp.dwire.com atl01db07-ibvip
192.168.16.xx atl01db08-ibvip.corp.dwire.com atl01db08-ibvip
-- Infiniband Sample config:
cat /etc/sysconfig/network-scripts/ifcfg-bondib0
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=bondib0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.16.1
NETMASK=255.255.252.0
NETWORK=192.168.16.0
BROADCAST=192.168.19.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
MTU=65520
2.
As root user, add secondary network to the cluster:
-- Currenly How many networks configured? -- Only one network found
/u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
Output:
$ /u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
ora.net1.network
-- Add the second network
/u01/app/11.2.0.2/grid/bin/srvctl add network -k 2 -S 192.168.16.0/255.255.252.0/bondib0
-- Help 11.2.0.2
> srvctl add network -help
Adds a network configuration to the Oracle Clusterware.
Usage: srvctl add network [-k net_num] -S subnet/netmask/[if1[if2...]] [-w network_type] [-v]
-k net_num network number (default number is 1)
-S subnet/netmask/[if1[if2...]] NET address spec for network
-w network_type The network type (static, dhcp, mixed)
-h Print usage
-v Verbose output
-- Verify newly added network has been added
/u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
Output:
$ /u01/app/11.2.0.2/grid/bin/crsctl stat res -t | grep net
ora.net1.network
ora.net2.network
3.
-- Register newly created Virtual IP's with the CRS as root user:
cd /u01/app/11.2.0.2/grid/bin
srvctl add vip -n atl01db01 -A atl01db01-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db02 -A atl01db02-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db03 -A atl01db03-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db04 -A atl01db04-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db05 -A atl01db05-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db06 -A atl01db06-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db07 -A atl01db07-ibvip/255.255.252.0/bondib0 -k 2
srvctl add vip -n atl01db08 -A atl01db08-ibvip/255.255.252.0/bondib0 -k 2
4.
As oracle user:
-- Register newly added listener to CRS to make sure listener will come up automatically after the server restart.
srvctl add listener -l LISTENER_IB -k 2 -p TCP:1522
srvctl config listener
Output:
$ srvctl config listener
Name: LISTENER
Network: 1, Owner: oracle
Home: CRS home
End points: TCP:1521
Name: LISTENER_IB
Network: 2, Owner: oracle
Home: CRS home
End points: TCP:1522
-- Help 11.2.0.2
> srvctl add listener -help
Adds a listener configuration to the Oracle Clusterware.
Usage: srvctl add listener [-l lsnr_name] [-s] [-p "[TCP:]port[, ...][/IPC:key][/NMP:pipe_name][/TCPS:s_port] [/SDP:port]"] [-o oracle_home] [-k net_num]
-l lsnr_name Listener name (default name is LISTENER)
-o oracle_home ORACLE_HOME path (default value is CRS_HOME)
-k net_num network number (default number is 1)
-s Skip the checking of ports
-p "[TCP:]port[, ...][/IPC:key][/NMP:pipe_name][/TCPS:s_port] [/SDP:port]" Comma separated tcp ports or listener endpoints
-h Print usage
5.
Update $ORACLE_HOME/network/admin/tnsnames.ora file on all nodes:
Imp:
Each TNSNAMES.ORA file is different for each server depending on the Local and Remote listener settings.
Local Listener, have information about local node and local IB vip host information.
Remote Listener, have information about other IB vip hosts information except the Local node
Sample entries for node 1 (atl01db01):
Local:
LISTENER_IBLOCAL.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db01-ibvip)(PORT = 1522))
)
)
LISTENER_IPLOCAL.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db01)(PORT = 1521))
)
)
Remote:
LISTENER_IBREMOTE.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db02-ibvip)(PORT = 1522)) -- no node 1 information on node 1!!
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db03-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db04-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db05-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db06-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db07-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db08-ibvip)(PORT = 1522))
)
)
LISTENER_IPREMOTE.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = ATL02-SCAN)(PORT = 1521))
)
)
Imp:
Repeat the same for the other nodes, changing Local and Remote values accordingly.
6 .
-- Update the listeners_network DB parameter on all the databases with the listener addresses from both the networks, i.,e network1 and network2
alter system set listener_networks='((NAME=network2) (LOCAL_LISTENER=LISTENER_IBLOCAL)(REMOTE_LISTENER=LISTENER_IBREMOTE))','((NAME=network1)(LOCAL_LISTENER=LISTENER_IPLOCAL)(REMOTE_LISTENER=LISTENER_IPREMOTE))' scope=both sid='*';
-- Restart the listener and validate the entry.
show parameter listener_networks
-- What do I see with "ps -ef"?
> ps -ef | grep LISTENER
oracle 12874 1 0 May09 ? 00:01:20 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER -inherit
oracle 24335 1 0 May12 ? 00:05:22 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER_IB -inherit
7.
-- Add TNS entry on client side:
Sample TNS entry:
ODS.WORLD =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db01-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db02-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db03-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db04-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db05-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db06-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db07-ibvip)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = atl01db08-ibvip)(PORT = 1522))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = ODS_INFA_SERVICE.world)
)
)
-- How did you create the Service?:
$ srvctl add service -d ODS -s ODS_INFA_SERVICE -r ODS1,ODS2,ODS3,ODS4 -a ODS5,ODS6,ODS7,ODS8 -q TRUE -m BASIC -e SELECT -z 180 -w 5 -j LONG -k2
-- Service configuration:
$ srvctl config service -d ODS -s ODS_INFA_SERVICE
Service name: ODS_INFA_SERVICE
Service is enabled
Server pool: ODS_ODS_INFA_SERVICE
Cardinality: 4
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: true
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 180
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Preferred instances: ODS1,ODS2,ODS3,ODS4
Available instances: ODS5,ODS6,ODS7,ODS8
-- Help 11.2.0.2
> srvctl add service -help
Adds a service configuration to the Oracle Clusterware.
Usage: srvctl add service -d db_unique_name -s service_name {-r "preferred_list" [-a "available_list"] [-P {BASIC | NONE | PRECONNECT}]
-g pool_name [-c {UNIFORM | SINGLETON}] } [-k net_num] [-l [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]]
[-y {AUTOMATIC | MANUAL}] [-q {TRUE | FALSE}]
[-x {TRUE | FALSE}] [-j {SHORT | LONG}]
[-B {NONE | SERVICE_TIME | THROUGHPUT}]
[-e {NONE | SESSION | SELECT}] [-m {NONE | BASIC}]
[-z failover_retries] [-w failover_delay] [-t edition] [-f]
-d db_unique_name Unique name for the database
-s service Service name
-r "preferred_list" Comma separated list of preferred instances
-a "available_list" Comma separated list of available instances
-g pool_name Server pool name
-c {UNIFORM | SINGLETON} Service runs on every active server in the server pool hosting this service (UNIFORM) or just one server (SINGLETON)
-k net_num network number (default number is 1)
-P {NONE | BASIC | PRECONNECT} TAF policy specification
-l role Role of the service (primary, physical_standby, logical_standby, snapshot_standby)
-y policy Management policy for the service (AUTOMATIC or MANUAL)
-e Failover type Failover type (NONE, SESSION, or SELECT)
-m Failover method Failover method (NONE or BASIC)
-w integer Failover delay
-z integer Failover retries
-t edition Edition (or "" for empty edition value)
-j clb_goal Connection Load Balancing Goal (SHORT or LONG). Default is LONG.
-B Runtime Load Balancing Goal Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE)
-x Distributed Transaction Processing Distributed Transaction Processing (TRUE or FALSE)
-q AQ HA notifications AQ HA notifications (TRUE or FALSE)
Usage: srvctl add service -d db_unique_name -s service_name -u {-r "new_pref_inst"
-a "new_avail_inst"} [-f]
-d db_unique_name Unique name for the database
-s service Service name
-u Add a new instance to service configuration
-r new_pref_inst Name of new preferred instance
-a new_avail_inst Name of new available instance
-f Force the add operation even though a listener is not configured for a network
-h Print usage
Friday, May 25, 2012
Exadata Flash Cache Maintenance:
Exadata Flash Cache Maintenance:
This is one of the common maintenance task on Exadata cell nodes; just want to publish the steps to correct Flash Cache issue.
-- Pre re-create flashcache steps
1.
-- Run the following command to check if there are other offline disks
CellCLI> LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'
If any grid disks are returned, then it is not safe to take the storage server offline because proper Oracle ASM disk group redundancy will not be intact.
Taking the storage server offline when one or more grid disks are in this state will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.
Output:
CellCLI> LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes' -- No records selected. -- Safe
CellCLI>
CellCLI> LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome = 'Yes'
DATA_CD_00_atl02cel06
DATA_CD_01_atl02cel06
DATA_CD_02_atl02cel06
DATA_CD_03_atl02cel06
DATA_CD_04_atl02cel06
DATA_CD_05_atl02cel06
DATA_CD_06_atl02cel06
DATA_CD_07_atl02cel06
DATA_CD_08_atl02cel06
DATA_CD_09_atl02cel06
DATA_CD_10_atl02cel06
DATA_CD_11_atl02cel06
RECO_CD_00_atl02cel06
RECO_CD_01_atl02cel06
RECO_CD_02_atl02cel06
RECO_CD_03_atl02cel06
RECO_CD_04_atl02cel06
RECO_CD_05_atl02cel06
RECO_CD_06_atl02cel06
RECO_CD_07_atl02cel06
RECO_CD_08_atl02cel06
RECO_CD_09_atl02cel06
RECO_CD_10_atl02cel06
RECO_CD_11_atl02cel06
SYSTEMDG_CD_02_atl02cel06
SYSTEMDG_CD_03_atl02cel06
SYSTEMDG_CD_04_atl02cel06
SYSTEMDG_CD_05_atl02cel06
SYSTEMDG_CD_06_atl02cel06
SYSTEMDG_CD_07_atl02cel06
SYSTEMDG_CD_08_atl02cel06
SYSTEMDG_CD_09_atl02cel06
SYSTEMDG_CD_10_atl02cel06
SYSTEMDG_CD_11_atl02cel06
2.
-- Get a listing of physical disks and celldisk. You can use this later to match up the WWN's to ensure you are removing the correct module.
CellCLI> list physicaldisk
CellCLI> list celldisk
Output:
-- Missing disk output: - missing few disks from 3 and 4 in this output.
CellCLI> list physicaldisk
16:0 JK1130YAHARN4T normal
16:1 JK1130YAHBWWZT normal
16:2 JK1130YAHBV76T normal
16:3 JK1130YAHBV7TT normal
16:4 JK1130YAH49UJT normal
16:5 JK1130YAHBJ5HT normal
16:6 JK1130YAHBJNHT normal
16:7 JK1130YAHATA3T normal
16:8 JK1130YAHAJX7T normal
16:9 JK1130YAHBJNJT normal
16:10 JK1130YAHBSBZT normal
16:11 JK1130YAHBV7PT normal
[1:0:0:0] 5080020000c47c0FMOD0 normal
[1:0:1:0] 5080020000c47c0FMOD1 normal
[1:0:2:0] 5080020000c47c0FMOD2 normal
[1:0:3:0] 5080020000c47c0FMOD3 normal
[2:0:0:0] 5080020000c47eaFMOD0 normal
[2:0:1:0] 5080020000c47eaFMOD1 normal
[2:0:2:0] 5080020000c47eaFMOD2 normal
[2:0:3:0] 5080020000c47eaFMOD3 normal
-- Some times output may look like this. -- This output is from different server
CellCLI> list physicaldisk
24:0 JK1130YAG52NAT normal
24:1 JK1130YAG4EPZT normal
24:2 JK1130YAG0VABT normal
24:3 JK1130YAG536XT normal
24:4 JK1130YAG51TVT normal
24:5 JK1130YAG0VA9T normal
24:6 JK1130YAG51TUT normal
24:7 JK1130YAG0VRKT normal
24:8 JK1130YAG51ZYT normal
24:9 JK1130YAG0VD4T normal
24:10 JK1130YAG0VN3T normal
24:11 JK1130YAG52NKT normal
[1:0:0:0] 5080020000c4262FMOD0 normal
[1:0:1:0] 5080020000c4262FMOD1 normal
[1:0:2:0] 5080020000c4262FMOD2 normal
[1:0:3:0] 5080020000c4262FMOD3 normal
[2:0:0:0] 5080020000c421cFMOD0 normal
[2:0:1:0] 5080020000c421cFMOD1 normal
[2:0:2:0] 5080020000c421cFMOD2 normal
[3:0:0:0] 5080020000c422eFMOD0 normal
[3:0:1:0] 5080020000c422eFMOD1 poor performance
[3:0:2:0] 5080020000c422eFMOD2 poor performance
[3:0:3:0] 5080020000c422eFMOD3 poor performance
[4:0:0:0] 5080020000c427cFMOD0 normal
[4:0:1:0] 5080020000c427cFMOD1 normal
[4:0:2:0] 5080020000c427cFMOD2 normal
[4:0:3:0] 5080020000c427cFMOD3 normal
CellCLI> list celldisk
CD_00_atl02cel06 normal
CD_01_atl02cel06 normal
CD_02_atl02cel06 normal
CD_03_atl02cel06 normal
CD_04_atl02cel06 normal
CD_05_atl02cel06 normal
CD_06_atl02cel06 normal
CD_07_atl02cel06 normal
CD_08_atl02cel06 normal
CD_09_atl02cel06 normal
CD_10_atl02cel06 normal
CD_11_atl02cel06 normal
FD_00_atl02cel06 normal
FD_01_atl02cel06 normal
FD_02_atl02cel06 normal
FD_03_atl02cel06 normal
FD_04_atl02cel06 not present
FD_05_atl02cel06 not present
FD_06_atl02cel06 not present
FD_07_atl02cel06 not present
FD_08_atl02cel06 normal
FD_09_atl02cel06 normal
FD_10_atl02cel06 normal
FD_11_atl02cel06 normal
FD_12_atl02cel06 not present
FD_13_atl02cel06 not present
FD_14_atl02cel06 not present
FD_15_atl02cel06 not present
3.
-- Inactivate all the grid disks when Oracle Exadata Storage Server is safe to take offline using the following command:
CellCLI> ALTER GRIDDISK ALL INACTIVE
The preceding command will complete once all disks are inactive and offline.
Depending on the storage server activity, it may take several minutes for this command to complete.
4.
-- Verify all grid disks are INACTIVE to allow safe storage server shut down by running the following command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
CellCLI> LIST GRIDDISK
If all grid disks are INACTIVE, then the storage server can be shutdown without affecting database availability
Output:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
DATA_CD_00_atl02cel06 UNKNOWN
DATA_CD_01_atl02cel06 UNKNOWN
DATA_CD_02_atl02cel06 UNKNOWN
DATA_CD_03_atl02cel06 UNKNOWN
DATA_CD_04_atl02cel06 UNKNOWN
DATA_CD_05_atl02cel06 UNKNOWN
DATA_CD_06_atl02cel06 UNKNOWN
DATA_CD_07_atl02cel06 UNKNOWN
DATA_CD_08_atl02cel06 UNKNOWN
DATA_CD_09_atl02cel06 UNKNOWN
DATA_CD_10_atl02cel06 UNKNOWN
DATA_CD_11_atl02cel06 UNKNOWN
RECO_CD_00_atl02cel06 UNKNOWN
RECO_CD_01_atl02cel06 UNKNOWN
RECO_CD_02_atl02cel06 UNKNOWN
RECO_CD_03_atl02cel06 UNKNOWN
RECO_CD_04_atl02cel06 UNKNOWN
RECO_CD_05_atl02cel06 UNKNOWN
RECO_CD_06_atl02cel06 UNKNOWN
RECO_CD_07_atl02cel06 UNKNOWN
RECO_CD_08_atl02cel06 UNKNOWN
RECO_CD_09_atl02cel06 UNKNOWN
RECO_CD_10_atl02cel06 UNKNOWN
RECO_CD_11_atl02cel06 UNKNOWN
SYSTEMDG_CD_02_atl02cel06 UNKNOWN
SYSTEMDG_CD_03_atl02cel06 UNKNOWN
SYSTEMDG_CD_04_atl02cel06 UNKNOWN
SYSTEMDG_CD_05_atl02cel06 UNKNOWN
SYSTEMDG_CD_06_atl02cel06 UNKNOWN
SYSTEMDG_CD_07_atl02cel06 UNKNOWN
SYSTEMDG_CD_08_atl02cel06 UNKNOWN
SYSTEMDG_CD_09_atl02cel06 UNKNOWN
SYSTEMDG_CD_10_atl02cel06 UNKNOWN
SYSTEMDG_CD_11_atl02cel06 UNKNOWN
-- This should be INACTIVE, when you do "ALTER GRIDDISK ALL INACTIVE"
CellCLI> LIST GRIDDISK
DATA_CD_00_atl02cel06 active
DATA_CD_01_atl02cel06 active
DATA_CD_02_atl02cel06 active
DATA_CD_03_atl02cel06 active
DATA_CD_04_atl02cel06 active
DATA_CD_05_atl02cel06 active
DATA_CD_06_atl02cel06 active
DATA_CD_07_atl02cel06 active
DATA_CD_08_atl02cel06 active
DATA_CD_09_atl02cel06 active
DATA_CD_10_atl02cel06 active
DATA_CD_11_atl02cel06 active
RECO_CD_00_atl02cel06 active
RECO_CD_01_atl02cel06 active
RECO_CD_02_atl02cel06 active
RECO_CD_03_atl02cel06 active
RECO_CD_04_atl02cel06 active
RECO_CD_05_atl02cel06 active
RECO_CD_06_atl02cel06 active
RECO_CD_07_atl02cel06 active
RECO_CD_08_atl02cel06 active
RECO_CD_09_atl02cel06 active
RECO_CD_10_atl02cel06 active
RECO_CD_11_atl02cel06 active
SYSTEMDG_CD_02_atl02cel06 active
SYSTEMDG_CD_03_atl02cel06 active
SYSTEMDG_CD_04_atl02cel06 active
SYSTEMDG_CD_05_atl02cel06 active
SYSTEMDG_CD_06_atl02cel06 active
SYSTEMDG_CD_07_atl02cel06 active
SYSTEMDG_CD_08_atl02cel06 active
SYSTEMDG_CD_09_atl02cel06 active
SYSTEMDG_CD_10_atl02cel06 active
SYSTEMDG_CD_11_atl02cel06 active
5.
-- Check the status of FLASHCACHE
CellCLI> LIST FLASHCACHE detail
Output: -- When have issue.
CellCLI> LIST FLASHCACHE detail
name: atl02cel06_FLASHCACHE
cellDisk: FD_10_atl02cel06,FD_09_atl02cel06,FD_01_atl02cel06,FD_11_atl02cel06,FD_03_atl02cel06,FD_08_atl02cel06,FD_00_atl02cel06,FD_02_atl02cel06
creationTime: 2010-07-14T19:51:22-04:00
degradedCelldisks: FD_13_atl02cel06,FD_14_atl02cel06,FD_07_atl02cel06,FD_15_atl02cel06,FD_05_atl02cel06,FD_06_atl02cel06,FD_04_atl02cel06,FD_12_atl02cel06
effectiveCacheSize: 182.625G
id: 60a179f7-2fdf-44f4-a63d-9cfdd03d42cc
size: 365.25G
status: warning
-- Status should be normal, but here it's warning...
6.
-- We can now drop the flashcache here using the command
CellCLI> drop flashcache all
This will ensure that on startup the flashcache does not start
7.
-- Stop the cell services using the following command:
CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
8.
Unix SA -- REBOOT atl02cel06.(root) for replacing the FLASHCACHE cards
#shutdown -h -y now
The cell services will be started automatically.
---------------------------------------------------
a. Replace the failed flash disk based on the PCI number and FDOM number.
NOTE: You can also use the WWN from the "list physicaldisk" command earlier.
Example: [2:0:0:0] 5080020000fcf12FMOD0 normal
[2:0:0:0] 5080020000f "cf12" FMOD0 normal
The number "cf12" above should be on an orange sticker on the back of each flash pci card.
b. Power up the cell. The cell services will be started automatically.
c. After boot, verify the replaced disks have the same firmware...
/opt/oracle.SupportTools/CheckHWnFWProfile -c strict
-OR-
dmesg | grep -i marvell # Verify all revisions are the same
-- If firmware is not the same across all disks, then do the following...
rm /opt/oracle.cellos/TRIED_FW_UPDATE_ONCE
reboot
---------------------------------------------------
-- Re-create flashcache Steps
9.
-- Configure Flash Cache
CellCLI> drop celldisk all flashdisk force
CellCLI> create celldisk all flashdisk
CellCLI> create flashcache all
CellCLI> LIST FLASHCACHE detail
10.
-- Bring all grid disks online using the following command:
CellCLI> ALTER GRIDDISK ALL ACTIVE
Note:
-- When the grid disks become active, Oracle ASM will automatically synchronize the gird disks to bring them back into the disk group.
-- If ASM is down, need to bring up ASM on atleast one node to see the celldisk "ONLINE", if not celldisk will NOT be in ONLINE until the ASM is up!!
--Validation steps
11.
-- Verify all grid disks have been successfully put online using the following command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
Wait until asmmodestatus is ONLINE for all grid disks.
Output:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
DATA_CD_00_atl02cel06 ONLINE
DATA_CD_01_atl02cel06 ONLINE
DATA_CD_02_atl02cel06 ONLINE
DATA_CD_03_atl02cel06 ONLINE
DATA_CD_04_atl02cel06 ONLINE
DATA_CD_05_atl02cel06 ONLINE
DATA_CD_06_atl02cel06 ONLINE
DATA_CD_07_atl02cel06 ONLINE
DATA_CD_08_atl02cel06 ONLINE
DATA_CD_09_atl02cel06 ONLINE
DATA_CD_10_atl02cel06 ONLINE
DATA_CD_11_atl02cel06 ONLINE
RECO_CD_00_atl02cel06 ONLINE
RECO_CD_01_atl02cel06 ONLINE
RECO_CD_02_atl02cel06 ONLINE
RECO_CD_03_atl02cel06 ONLINE
RECO_CD_04_atl02cel06 ONLINE
RECO_CD_05_atl02cel06 ONLINE
RECO_CD_06_atl02cel06 ONLINE
RECO_CD_07_atl02cel06 ONLINE
RECO_CD_08_atl02cel06 ONLINE
RECO_CD_09_atl02cel06 ONLINE
RECO_CD_10_atl02cel06 ONLINE
RECO_CD_11_atl02cel06 ONLINE
SYSTEMDG_CD_02_atl02cel06 ONLINE
SYSTEMDG_CD_03_atl02cel06 ONLINE
SYSTEMDG_CD_04_atl02cel06 ONLINE
SYSTEMDG_CD_05_atl02cel06 ONLINE
SYSTEMDG_CD_06_atl02cel06 ONLINE
SYSTEMDG_CD_07_atl02cel06 ONLINE
SYSTEMDG_CD_08_atl02cel06 ONLINE
SYSTEMDG_CD_09_atl02cel06 ONLINE
SYSTEMDG_CD_10_atl02cel06 ONLINE
SYSTEMDG_CD_11_atl02cel06 ONLINE
12.
-- Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.
-- Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server.
-- If synchronization is not complete, then the check performed on another storage server will fail.
CellCLI> list griddisk attributes name where asmdeactivationoutcome != 'Yes'
13.
-- Wait until asmmodestatus shows ONLINE for all grid disks.
-- List the disks and confirm they are all normal.
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
CellCLI> list physicaldisk
CellCLI> list celldisk
CellCLI> list griddisk
-- All disks are visable after re-creating the flashcache/ compare the output from Step 2
CellCLI> list physicaldisk
16:0 JK1130YAHARN4T normal
16:1 JK1130YAHBWWZT normal
16:2 JK1130YAHBV76T normal
16:3 JK1130YAHBV7TT normal
16:4 JK1130YAH49UJT normal
16:5 JK1130YAHBJ5HT normal
16:6 JK1130YAHBJNHT normal
16:7 JK1130YAHATA3T normal
16:8 JK1130YAHAJX7T normal
16:9 JK1130YAHBJNJT normal
16:10 JK1130YAHBSBZT normal
16:11 JK1130YAHBV7PT normal
[1:0:0:0] 5080020000c47c0FMOD0 normal
[1:0:1:0] 5080020000c47c0FMOD1 normal
[1:0:2:0] 5080020000c47c0FMOD2 normal
[1:0:3:0] 5080020000c47c0FMOD3 normal
[2:0:0:0] 5080020000c47eaFMOD0 normal
[2:0:1:0] 5080020000c47eaFMOD1 normal
[2:0:2:0] 5080020000c47eaFMOD2 normal
[2:0:3:0] 5080020000c47eaFMOD3 normal
[3:0:0:0] 5080020000c47feFMOD0 normal
[3:0:1:0] 5080020000c47feFMOD1 normal
[3:0:2:0] 5080020000c47feFMOD2 normal
[3:0:3:0] 5080020000c47feFMOD3 normal
[4:0:0:0] 5080020000c47b2FMOD0 normal
[4:0:1:0] 5080020000c47b2FMOD1 normal
[4:0:2:0] 5080020000c47b2FMOD2 normal
[4:0:3:0] 5080020000c47b2FMOD3 normal
--
CellCLI> list celldisk
CD_00_atl02cel06 normal
CD_01_atl02cel06 normal
CD_02_atl02cel06 normal
CD_03_atl02cel06 normal
CD_04_atl02cel06 normal
CD_05_atl02cel06 normal
CD_06_atl02cel06 normal
CD_07_atl02cel06 normal
CD_08_atl02cel06 normal
CD_09_atl02cel06 normal
CD_10_atl02cel06 normal
CD_11_atl02cel06 normal
FD_00_atl02cel06 normal
FD_01_atl02cel06 normal
FD_02_atl02cel06 normal
FD_03_atl02cel06 normal
FD_04_atl02cel06 normal
FD_05_atl02cel06 normal
FD_06_atl02cel06 normal
FD_07_atl02cel06 normal
FD_08_atl02cel06 normal
FD_09_atl02cel06 normal
FD_10_atl02cel06 normal
FD_11_atl02cel06 normal
FD_12_atl02cel06 normal
FD_13_atl02cel06 normal
FD_14_atl02cel06 normal
FD_15_atl02cel06 normal
--
CellCLI> list griddisk
DATA_CD_00_atl02cel06 active
DATA_CD_01_atl02cel06 active
DATA_CD_02_atl02cel06 active
DATA_CD_03_atl02cel06 active
DATA_CD_04_atl02cel06 active
DATA_CD_05_atl02cel06 active
DATA_CD_06_atl02cel06 active
DATA_CD_07_atl02cel06 active
DATA_CD_08_atl02cel06 active
DATA_CD_09_atl02cel06 active
DATA_CD_10_atl02cel06 active
DATA_CD_11_atl02cel06 active
RECO_CD_00_atl02cel06 active
RECO_CD_01_atl02cel06 active
RECO_CD_02_atl02cel06 active
RECO_CD_03_atl02cel06 active
RECO_CD_04_atl02cel06 active
RECO_CD_05_atl02cel06 active
RECO_CD_06_atl02cel06 active
RECO_CD_07_atl02cel06 active
RECO_CD_08_atl02cel06 active
RECO_CD_09_atl02cel06 active
RECO_CD_10_atl02cel06 active
RECO_CD_11_atl02cel06 active
SYSTEMDG_CD_02_atl02cel06 active
SYSTEMDG_CD_03_atl02cel06 active
SYSTEMDG_CD_04_atl02cel06 active
SYSTEMDG_CD_05_atl02cel06 active
SYSTEMDG_CD_06_atl02cel06 active
SYSTEMDG_CD_07_atl02cel06 active
SYSTEMDG_CD_08_atl02cel06 active
SYSTEMDG_CD_09_atl02cel06 active
SYSTEMDG_CD_10_atl02cel06 active
SYSTEMDG_CD_11_atl02cel06 active
14.
-- Go to all the cells and run this command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
--
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
DATA_CD_00_atl02cel06 ONLINE
DATA_CD_01_atl02cel06 ONLINE
DATA_CD_02_atl02cel06 ONLINE
DATA_CD_03_atl02cel06 ONLINE
DATA_CD_04_atl02cel06 ONLINE
DATA_CD_05_atl02cel06 ONLINE
DATA_CD_06_atl02cel06 ONLINE
DATA_CD_07_atl02cel06 ONLINE
DATA_CD_08_atl02cel06 ONLINE
DATA_CD_09_atl02cel06 ONLINE
DATA_CD_10_atl02cel06 ONLINE
DATA_CD_11_atl02cel06 ONLINE
RECO_CD_00_atl02cel06 ONLINE
RECO_CD_01_atl02cel06 ONLINE
RECO_CD_02_atl02cel06 ONLINE
RECO_CD_03_atl02cel06 ONLINE
RECO_CD_04_atl02cel06 ONLINE
RECO_CD_05_atl02cel06 ONLINE
RECO_CD_06_atl02cel06 ONLINE
RECO_CD_07_atl02cel06 ONLINE
RECO_CD_08_atl02cel06 ONLINE
RECO_CD_09_atl02cel06 ONLINE
RECO_CD_10_atl02cel06 ONLINE
RECO_CD_11_atl02cel06 ONLINE
SYSTEMDG_CD_02_atl02cel06 ONLINE
SYSTEMDG_CD_03_atl02cel06 ONLINE
SYSTEMDG_CD_04_atl02cel06 ONLINE
SYSTEMDG_CD_05_atl02cel06 ONLINE
SYSTEMDG_CD_06_atl02cel06 ONLINE
SYSTEMDG_CD_07_atl02cel06 ONLINE
SYSTEMDG_CD_08_atl02cel06 ONLINE
SYSTEMDG_CD_09_atl02cel06 ONLINE
SYSTEMDG_CD_10_atl02cel06 ONLINE
SYSTEMDG_CD_11_atl02cel06 ONLINE
This is one of the common maintenance task on Exadata cell nodes; just want to publish the steps to correct Flash Cache issue.
-- Pre re-create flashcache steps
1.
-- Run the following command to check if there are other offline disks
CellCLI> LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'
If any grid disks are returned, then it is not safe to take the storage server offline because proper Oracle ASM disk group redundancy will not be intact.
Taking the storage server offline when one or more grid disks are in this state will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.
Output:
CellCLI> LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes' -- No records selected. -- Safe
CellCLI>
CellCLI> LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome = 'Yes'
DATA_CD_00_atl02cel06
DATA_CD_01_atl02cel06
DATA_CD_02_atl02cel06
DATA_CD_03_atl02cel06
DATA_CD_04_atl02cel06
DATA_CD_05_atl02cel06
DATA_CD_06_atl02cel06
DATA_CD_07_atl02cel06
DATA_CD_08_atl02cel06
DATA_CD_09_atl02cel06
DATA_CD_10_atl02cel06
DATA_CD_11_atl02cel06
RECO_CD_00_atl02cel06
RECO_CD_01_atl02cel06
RECO_CD_02_atl02cel06
RECO_CD_03_atl02cel06
RECO_CD_04_atl02cel06
RECO_CD_05_atl02cel06
RECO_CD_06_atl02cel06
RECO_CD_07_atl02cel06
RECO_CD_08_atl02cel06
RECO_CD_09_atl02cel06
RECO_CD_10_atl02cel06
RECO_CD_11_atl02cel06
SYSTEMDG_CD_02_atl02cel06
SYSTEMDG_CD_03_atl02cel06
SYSTEMDG_CD_04_atl02cel06
SYSTEMDG_CD_05_atl02cel06
SYSTEMDG_CD_06_atl02cel06
SYSTEMDG_CD_07_atl02cel06
SYSTEMDG_CD_08_atl02cel06
SYSTEMDG_CD_09_atl02cel06
SYSTEMDG_CD_10_atl02cel06
SYSTEMDG_CD_11_atl02cel06
2.
-- Get a listing of physical disks and celldisk. You can use this later to match up the WWN's to ensure you are removing the correct module.
CellCLI> list physicaldisk
CellCLI> list celldisk
Output:
-- Missing disk output: - missing few disks from 3 and 4 in this output.
CellCLI> list physicaldisk
16:0 JK1130YAHARN4T normal
16:1 JK1130YAHBWWZT normal
16:2 JK1130YAHBV76T normal
16:3 JK1130YAHBV7TT normal
16:4 JK1130YAH49UJT normal
16:5 JK1130YAHBJ5HT normal
16:6 JK1130YAHBJNHT normal
16:7 JK1130YAHATA3T normal
16:8 JK1130YAHAJX7T normal
16:9 JK1130YAHBJNJT normal
16:10 JK1130YAHBSBZT normal
16:11 JK1130YAHBV7PT normal
[1:0:0:0] 5080020000c47c0FMOD0 normal
[1:0:1:0] 5080020000c47c0FMOD1 normal
[1:0:2:0] 5080020000c47c0FMOD2 normal
[1:0:3:0] 5080020000c47c0FMOD3 normal
[2:0:0:0] 5080020000c47eaFMOD0 normal
[2:0:1:0] 5080020000c47eaFMOD1 normal
[2:0:2:0] 5080020000c47eaFMOD2 normal
[2:0:3:0] 5080020000c47eaFMOD3 normal
-- Some times output may look like this. -- This output is from different server
CellCLI> list physicaldisk
24:0 JK1130YAG52NAT normal
24:1 JK1130YAG4EPZT normal
24:2 JK1130YAG0VABT normal
24:3 JK1130YAG536XT normal
24:4 JK1130YAG51TVT normal
24:5 JK1130YAG0VA9T normal
24:6 JK1130YAG51TUT normal
24:7 JK1130YAG0VRKT normal
24:8 JK1130YAG51ZYT normal
24:9 JK1130YAG0VD4T normal
24:10 JK1130YAG0VN3T normal
24:11 JK1130YAG52NKT normal
[1:0:0:0] 5080020000c4262FMOD0 normal
[1:0:1:0] 5080020000c4262FMOD1 normal
[1:0:2:0] 5080020000c4262FMOD2 normal
[1:0:3:0] 5080020000c4262FMOD3 normal
[2:0:0:0] 5080020000c421cFMOD0 normal
[2:0:1:0] 5080020000c421cFMOD1 normal
[2:0:2:0] 5080020000c421cFMOD2 normal
[3:0:0:0] 5080020000c422eFMOD0 normal
[3:0:1:0] 5080020000c422eFMOD1 poor performance
[3:0:2:0] 5080020000c422eFMOD2 poor performance
[3:0:3:0] 5080020000c422eFMOD3 poor performance
[4:0:0:0] 5080020000c427cFMOD0 normal
[4:0:1:0] 5080020000c427cFMOD1 normal
[4:0:2:0] 5080020000c427cFMOD2 normal
[4:0:3:0] 5080020000c427cFMOD3 normal
CellCLI> list celldisk
CD_00_atl02cel06 normal
CD_01_atl02cel06 normal
CD_02_atl02cel06 normal
CD_03_atl02cel06 normal
CD_04_atl02cel06 normal
CD_05_atl02cel06 normal
CD_06_atl02cel06 normal
CD_07_atl02cel06 normal
CD_08_atl02cel06 normal
CD_09_atl02cel06 normal
CD_10_atl02cel06 normal
CD_11_atl02cel06 normal
FD_00_atl02cel06 normal
FD_01_atl02cel06 normal
FD_02_atl02cel06 normal
FD_03_atl02cel06 normal
FD_04_atl02cel06 not present
FD_05_atl02cel06 not present
FD_06_atl02cel06 not present
FD_07_atl02cel06 not present
FD_08_atl02cel06 normal
FD_09_atl02cel06 normal
FD_10_atl02cel06 normal
FD_11_atl02cel06 normal
FD_12_atl02cel06 not present
FD_13_atl02cel06 not present
FD_14_atl02cel06 not present
FD_15_atl02cel06 not present
3.
-- Inactivate all the grid disks when Oracle Exadata Storage Server is safe to take offline using the following command:
CellCLI> ALTER GRIDDISK ALL INACTIVE
The preceding command will complete once all disks are inactive and offline.
Depending on the storage server activity, it may take several minutes for this command to complete.
4.
-- Verify all grid disks are INACTIVE to allow safe storage server shut down by running the following command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
CellCLI> LIST GRIDDISK
If all grid disks are INACTIVE, then the storage server can be shutdown without affecting database availability
Output:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
DATA_CD_00_atl02cel06 UNKNOWN
DATA_CD_01_atl02cel06 UNKNOWN
DATA_CD_02_atl02cel06 UNKNOWN
DATA_CD_03_atl02cel06 UNKNOWN
DATA_CD_04_atl02cel06 UNKNOWN
DATA_CD_05_atl02cel06 UNKNOWN
DATA_CD_06_atl02cel06 UNKNOWN
DATA_CD_07_atl02cel06 UNKNOWN
DATA_CD_08_atl02cel06 UNKNOWN
DATA_CD_09_atl02cel06 UNKNOWN
DATA_CD_10_atl02cel06 UNKNOWN
DATA_CD_11_atl02cel06 UNKNOWN
RECO_CD_00_atl02cel06 UNKNOWN
RECO_CD_01_atl02cel06 UNKNOWN
RECO_CD_02_atl02cel06 UNKNOWN
RECO_CD_03_atl02cel06 UNKNOWN
RECO_CD_04_atl02cel06 UNKNOWN
RECO_CD_05_atl02cel06 UNKNOWN
RECO_CD_06_atl02cel06 UNKNOWN
RECO_CD_07_atl02cel06 UNKNOWN
RECO_CD_08_atl02cel06 UNKNOWN
RECO_CD_09_atl02cel06 UNKNOWN
RECO_CD_10_atl02cel06 UNKNOWN
RECO_CD_11_atl02cel06 UNKNOWN
SYSTEMDG_CD_02_atl02cel06 UNKNOWN
SYSTEMDG_CD_03_atl02cel06 UNKNOWN
SYSTEMDG_CD_04_atl02cel06 UNKNOWN
SYSTEMDG_CD_05_atl02cel06 UNKNOWN
SYSTEMDG_CD_06_atl02cel06 UNKNOWN
SYSTEMDG_CD_07_atl02cel06 UNKNOWN
SYSTEMDG_CD_08_atl02cel06 UNKNOWN
SYSTEMDG_CD_09_atl02cel06 UNKNOWN
SYSTEMDG_CD_10_atl02cel06 UNKNOWN
SYSTEMDG_CD_11_atl02cel06 UNKNOWN
-- This should be INACTIVE, when you do "ALTER GRIDDISK ALL INACTIVE"
CellCLI> LIST GRIDDISK
DATA_CD_00_atl02cel06 active
DATA_CD_01_atl02cel06 active
DATA_CD_02_atl02cel06 active
DATA_CD_03_atl02cel06 active
DATA_CD_04_atl02cel06 active
DATA_CD_05_atl02cel06 active
DATA_CD_06_atl02cel06 active
DATA_CD_07_atl02cel06 active
DATA_CD_08_atl02cel06 active
DATA_CD_09_atl02cel06 active
DATA_CD_10_atl02cel06 active
DATA_CD_11_atl02cel06 active
RECO_CD_00_atl02cel06 active
RECO_CD_01_atl02cel06 active
RECO_CD_02_atl02cel06 active
RECO_CD_03_atl02cel06 active
RECO_CD_04_atl02cel06 active
RECO_CD_05_atl02cel06 active
RECO_CD_06_atl02cel06 active
RECO_CD_07_atl02cel06 active
RECO_CD_08_atl02cel06 active
RECO_CD_09_atl02cel06 active
RECO_CD_10_atl02cel06 active
RECO_CD_11_atl02cel06 active
SYSTEMDG_CD_02_atl02cel06 active
SYSTEMDG_CD_03_atl02cel06 active
SYSTEMDG_CD_04_atl02cel06 active
SYSTEMDG_CD_05_atl02cel06 active
SYSTEMDG_CD_06_atl02cel06 active
SYSTEMDG_CD_07_atl02cel06 active
SYSTEMDG_CD_08_atl02cel06 active
SYSTEMDG_CD_09_atl02cel06 active
SYSTEMDG_CD_10_atl02cel06 active
SYSTEMDG_CD_11_atl02cel06 active
5.
-- Check the status of FLASHCACHE
CellCLI> LIST FLASHCACHE detail
Output: -- When have issue.
CellCLI> LIST FLASHCACHE detail
name: atl02cel06_FLASHCACHE
cellDisk: FD_10_atl02cel06,FD_09_atl02cel06,FD_01_atl02cel06,FD_11_atl02cel06,FD_03_atl02cel06,FD_08_atl02cel06,FD_00_atl02cel06,FD_02_atl02cel06
creationTime: 2010-07-14T19:51:22-04:00
degradedCelldisks: FD_13_atl02cel06,FD_14_atl02cel06,FD_07_atl02cel06,FD_15_atl02cel06,FD_05_atl02cel06,FD_06_atl02cel06,FD_04_atl02cel06,FD_12_atl02cel06
effectiveCacheSize: 182.625G
id: 60a179f7-2fdf-44f4-a63d-9cfdd03d42cc
size: 365.25G
status: warning
-- Status should be normal, but here it's warning...
6.
-- We can now drop the flashcache here using the command
CellCLI> drop flashcache all
This will ensure that on startup the flashcache does not start
7.
-- Stop the cell services using the following command:
CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
8.
Unix SA -- REBOOT atl02cel06.(root) for replacing the FLASHCACHE cards
#shutdown -h -y now
The cell services will be started automatically.
---------------------------------------------------
a. Replace the failed flash disk based on the PCI number and FDOM number.
NOTE: You can also use the WWN from the "list physicaldisk" command earlier.
Example: [2:0:0:0] 5080020000fcf12FMOD0 normal
[2:0:0:0] 5080020000f "cf12" FMOD0 normal
The number "cf12" above should be on an orange sticker on the back of each flash pci card.
b. Power up the cell. The cell services will be started automatically.
c. After boot, verify the replaced disks have the same firmware...
/opt/oracle.SupportTools/CheckHWnFWProfile -c strict
-OR-
dmesg | grep -i marvell # Verify all revisions are the same
-- If firmware is not the same across all disks, then do the following...
rm /opt/oracle.cellos/TRIED_FW_UPDATE_ONCE
reboot
---------------------------------------------------
-- Re-create flashcache Steps
9.
-- Configure Flash Cache
CellCLI> drop celldisk all flashdisk force
CellCLI> create celldisk all flashdisk
CellCLI> create flashcache all
CellCLI> LIST FLASHCACHE detail
10.
-- Bring all grid disks online using the following command:
CellCLI> ALTER GRIDDISK ALL ACTIVE
Note:
-- When the grid disks become active, Oracle ASM will automatically synchronize the gird disks to bring them back into the disk group.
-- If ASM is down, need to bring up ASM on atleast one node to see the celldisk "ONLINE", if not celldisk will NOT be in ONLINE until the ASM is up!!
--Validation steps
11.
-- Verify all grid disks have been successfully put online using the following command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
Wait until asmmodestatus is ONLINE for all grid disks.
Output:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
DATA_CD_00_atl02cel06 ONLINE
DATA_CD_01_atl02cel06 ONLINE
DATA_CD_02_atl02cel06 ONLINE
DATA_CD_03_atl02cel06 ONLINE
DATA_CD_04_atl02cel06 ONLINE
DATA_CD_05_atl02cel06 ONLINE
DATA_CD_06_atl02cel06 ONLINE
DATA_CD_07_atl02cel06 ONLINE
DATA_CD_08_atl02cel06 ONLINE
DATA_CD_09_atl02cel06 ONLINE
DATA_CD_10_atl02cel06 ONLINE
DATA_CD_11_atl02cel06 ONLINE
RECO_CD_00_atl02cel06 ONLINE
RECO_CD_01_atl02cel06 ONLINE
RECO_CD_02_atl02cel06 ONLINE
RECO_CD_03_atl02cel06 ONLINE
RECO_CD_04_atl02cel06 ONLINE
RECO_CD_05_atl02cel06 ONLINE
RECO_CD_06_atl02cel06 ONLINE
RECO_CD_07_atl02cel06 ONLINE
RECO_CD_08_atl02cel06 ONLINE
RECO_CD_09_atl02cel06 ONLINE
RECO_CD_10_atl02cel06 ONLINE
RECO_CD_11_atl02cel06 ONLINE
SYSTEMDG_CD_02_atl02cel06 ONLINE
SYSTEMDG_CD_03_atl02cel06 ONLINE
SYSTEMDG_CD_04_atl02cel06 ONLINE
SYSTEMDG_CD_05_atl02cel06 ONLINE
SYSTEMDG_CD_06_atl02cel06 ONLINE
SYSTEMDG_CD_07_atl02cel06 ONLINE
SYSTEMDG_CD_08_atl02cel06 ONLINE
SYSTEMDG_CD_09_atl02cel06 ONLINE
SYSTEMDG_CD_10_atl02cel06 ONLINE
SYSTEMDG_CD_11_atl02cel06 ONLINE
12.
-- Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.
-- Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server.
-- If synchronization is not complete, then the check performed on another storage server will fail.
CellCLI> list griddisk attributes name where asmdeactivationoutcome != 'Yes'
13.
-- Wait until asmmodestatus shows ONLINE for all grid disks.
-- List the disks and confirm they are all normal.
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
CellCLI> list physicaldisk
CellCLI> list celldisk
CellCLI> list griddisk
-- All disks are visable after re-creating the flashcache/ compare the output from Step 2
CellCLI> list physicaldisk
16:0 JK1130YAHARN4T normal
16:1 JK1130YAHBWWZT normal
16:2 JK1130YAHBV76T normal
16:3 JK1130YAHBV7TT normal
16:4 JK1130YAH49UJT normal
16:5 JK1130YAHBJ5HT normal
16:6 JK1130YAHBJNHT normal
16:7 JK1130YAHATA3T normal
16:8 JK1130YAHAJX7T normal
16:9 JK1130YAHBJNJT normal
16:10 JK1130YAHBSBZT normal
16:11 JK1130YAHBV7PT normal
[1:0:0:0] 5080020000c47c0FMOD0 normal
[1:0:1:0] 5080020000c47c0FMOD1 normal
[1:0:2:0] 5080020000c47c0FMOD2 normal
[1:0:3:0] 5080020000c47c0FMOD3 normal
[2:0:0:0] 5080020000c47eaFMOD0 normal
[2:0:1:0] 5080020000c47eaFMOD1 normal
[2:0:2:0] 5080020000c47eaFMOD2 normal
[2:0:3:0] 5080020000c47eaFMOD3 normal
[3:0:0:0] 5080020000c47feFMOD0 normal
[3:0:1:0] 5080020000c47feFMOD1 normal
[3:0:2:0] 5080020000c47feFMOD2 normal
[3:0:3:0] 5080020000c47feFMOD3 normal
[4:0:0:0] 5080020000c47b2FMOD0 normal
[4:0:1:0] 5080020000c47b2FMOD1 normal
[4:0:2:0] 5080020000c47b2FMOD2 normal
[4:0:3:0] 5080020000c47b2FMOD3 normal
--
CellCLI> list celldisk
CD_00_atl02cel06 normal
CD_01_atl02cel06 normal
CD_02_atl02cel06 normal
CD_03_atl02cel06 normal
CD_04_atl02cel06 normal
CD_05_atl02cel06 normal
CD_06_atl02cel06 normal
CD_07_atl02cel06 normal
CD_08_atl02cel06 normal
CD_09_atl02cel06 normal
CD_10_atl02cel06 normal
CD_11_atl02cel06 normal
FD_00_atl02cel06 normal
FD_01_atl02cel06 normal
FD_02_atl02cel06 normal
FD_03_atl02cel06 normal
FD_04_atl02cel06 normal
FD_05_atl02cel06 normal
FD_06_atl02cel06 normal
FD_07_atl02cel06 normal
FD_08_atl02cel06 normal
FD_09_atl02cel06 normal
FD_10_atl02cel06 normal
FD_11_atl02cel06 normal
FD_12_atl02cel06 normal
FD_13_atl02cel06 normal
FD_14_atl02cel06 normal
FD_15_atl02cel06 normal
--
CellCLI> list griddisk
DATA_CD_00_atl02cel06 active
DATA_CD_01_atl02cel06 active
DATA_CD_02_atl02cel06 active
DATA_CD_03_atl02cel06 active
DATA_CD_04_atl02cel06 active
DATA_CD_05_atl02cel06 active
DATA_CD_06_atl02cel06 active
DATA_CD_07_atl02cel06 active
DATA_CD_08_atl02cel06 active
DATA_CD_09_atl02cel06 active
DATA_CD_10_atl02cel06 active
DATA_CD_11_atl02cel06 active
RECO_CD_00_atl02cel06 active
RECO_CD_01_atl02cel06 active
RECO_CD_02_atl02cel06 active
RECO_CD_03_atl02cel06 active
RECO_CD_04_atl02cel06 active
RECO_CD_05_atl02cel06 active
RECO_CD_06_atl02cel06 active
RECO_CD_07_atl02cel06 active
RECO_CD_08_atl02cel06 active
RECO_CD_09_atl02cel06 active
RECO_CD_10_atl02cel06 active
RECO_CD_11_atl02cel06 active
SYSTEMDG_CD_02_atl02cel06 active
SYSTEMDG_CD_03_atl02cel06 active
SYSTEMDG_CD_04_atl02cel06 active
SYSTEMDG_CD_05_atl02cel06 active
SYSTEMDG_CD_06_atl02cel06 active
SYSTEMDG_CD_07_atl02cel06 active
SYSTEMDG_CD_08_atl02cel06 active
SYSTEMDG_CD_09_atl02cel06 active
SYSTEMDG_CD_10_atl02cel06 active
SYSTEMDG_CD_11_atl02cel06 active
14.
-- Go to all the cells and run this command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
--
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
DATA_CD_00_atl02cel06 ONLINE
DATA_CD_01_atl02cel06 ONLINE
DATA_CD_02_atl02cel06 ONLINE
DATA_CD_03_atl02cel06 ONLINE
DATA_CD_04_atl02cel06 ONLINE
DATA_CD_05_atl02cel06 ONLINE
DATA_CD_06_atl02cel06 ONLINE
DATA_CD_07_atl02cel06 ONLINE
DATA_CD_08_atl02cel06 ONLINE
DATA_CD_09_atl02cel06 ONLINE
DATA_CD_10_atl02cel06 ONLINE
DATA_CD_11_atl02cel06 ONLINE
RECO_CD_00_atl02cel06 ONLINE
RECO_CD_01_atl02cel06 ONLINE
RECO_CD_02_atl02cel06 ONLINE
RECO_CD_03_atl02cel06 ONLINE
RECO_CD_04_atl02cel06 ONLINE
RECO_CD_05_atl02cel06 ONLINE
RECO_CD_06_atl02cel06 ONLINE
RECO_CD_07_atl02cel06 ONLINE
RECO_CD_08_atl02cel06 ONLINE
RECO_CD_09_atl02cel06 ONLINE
RECO_CD_10_atl02cel06 ONLINE
RECO_CD_11_atl02cel06 ONLINE
SYSTEMDG_CD_02_atl02cel06 ONLINE
SYSTEMDG_CD_03_atl02cel06 ONLINE
SYSTEMDG_CD_04_atl02cel06 ONLINE
SYSTEMDG_CD_05_atl02cel06 ONLINE
SYSTEMDG_CD_06_atl02cel06 ONLINE
SYSTEMDG_CD_07_atl02cel06 ONLINE
SYSTEMDG_CD_08_atl02cel06 ONLINE
SYSTEMDG_CD_09_atl02cel06 ONLINE
SYSTEMDG_CD_10_atl02cel06 ONLINE
SYSTEMDG_CD_11_atl02cel06 ONLINE
Subscribe to:
Posts (Atom)