Tuesday, August 9, 2011

How to find a SQL or session is running on which cell node and what it is doing?

How to find a SQL or session is running on which cell node and what it is doing?

-- Active SQL from session ID: 1271:
UPDATE etl_job_control
SET start_dt = (SELECT end_dt
FROM etl_job_control
WHERE job_name = 's_m_network_stg'),
end_dt =
(SELECT DATE '1970-01-01' + ((SELECT MAX (mod_t)
FROM pin_user.event) / 24 / 60 / 60)
FROM DUAL),
last_update_dt = SYSDATE
WHERE job_name = :jn;


-- Wait event on the session
atl02db01:ODS1\sys: SQL> select event from gv$session where sid = 1271 and serial# = 31026;

EVENT
--------------------------------
cell multiblock physical read


-- Let's find the details from cell node
LIST ACTIVEREQUEST where sessionID=1271 and sessionSerNumber=31026 detail -- It looks like session runs on one cell node at any given time to get the data from the grid disk, as shown below session moved from cell 9 to 10.

atl02cel09:
CellCLI> LIST ACTIVEREQUEST where sessionID=1271 and sessionSerNumber=31026 detail
name: 5382
asmDiskGroupNumber: 3
asmFileIncarnation: 739372987
asmFileNumber: 2518
consumerGroupID: 4866
consumerGroupName: OTHER_GROUPS
dbID: 3108331340
dbName: ODS
dbRequestID: 5382
fileType: Datafile
id: 5382
instanceNumber: 1
ioBytes: 524288
ioBytesSofar: 0
ioGridDisk: DATA_CD_10_atl02cel09
ioOffset: 790293632
ioReason: "BufferCache Read"
ioType: CacheGet
objectNumber: 8104218
parentID: 5382
requestState: "Queued for Disk Read"
sessionID: 1271
sessionSerNumber: 31026
sqlID: 7p843mkscna11
tableSpaceNumber: 197

atl02cel10:
CellCLI> LIST ACTIVEREQUEST where sessionID=1271 and sessionSerNumber=31026 detail
name: 2301
asmDiskGroupNumber: 3
asmFileIncarnation: 731923539
asmFileNumber: 2059
consumerGroupID: 4866
consumerGroupName: OTHER_GROUPS
dbID: 3108331340
dbName: ODS
dbRequestID: 2301
fileType: Datafile
id: 2301
instanceNumber: 1
ioBytes: 376832
ioBytesSofar: 0
ioGridDisk: DATA_CD_01_atl02cel10
ioOffset: 669064480
ioReason: "BufferCache Read"
ioType: CacheGet
objectNumber: 8104222
parentID: 2301
requestState: "Queued for Disk Read"
sessionID: 1271
sessionSerNumber: 31026
sqlID: 7p843mkscna11
tableSpaceNumber: 197

-- Want to see all the active request from instance 1 for database ODS
CellCLI> LIST ACTIVEREQUEST where instanceNumber=1 and dbName=ODS

Sunday, July 31, 2011

Exadata Smart Scan(cell offload processing), Exadata Smart Flash Cache and Exadata Storage Index Capability's

Exadata Smart Scan(cell offload processing), Exadata Smart Flash Cache and Exadata Storage Index Capability's

a).
-- Exadata Smart Scan(cell offload processing)


Exadata Smart Scan processes queries at the storage layer, returning only relevant rows and columns to the database server.
As a result, much less data travels over fast 40GB Infiniband interconnect, dramatically improving the performance and concurrency of simple and complex queries.


-- Which predicates are supported by a cell? What happens when predicates contain functions or expressions?

a) Numeric functions: they can all be offloaded with a single exception, the
WIDTH_BUCKET function. For example the predicate "width_bucket(n,1,10,100) = 1"
is not offloaded.

b) Character functions returning character values: they can all be offloaded.

c) Character functions returning number values: they can all be offloaded.

d) Datetime functions:
DATE datatype is supported.
When a TIMESTAMP datatype is involved, offloading almost never happens.
While predicates like "d = sysdate" (note that "d" is a column of DATE datatype)
and "add_months(d,1) = to_date('01-01-2010','DD-MM-YYYY')" can be offloaded,
something like "add_months(d,1) = sysdate" cannot.
Every datetime function can be offloaded provided that it is not used along with
SYSDATE or CURRENT_DATE.

Note: There is no synchronization between the db layer and the storage layer for the purpose of offloading datetime functions. Whatever they would do, it is simply impossible to have exactly synchronized timers. Hence, inconsistent results might be generated. Therefore, IMO, the date/time for stuff like SYSDATE must be provided by the instance where a SQL statement is started.

months_between(d,sysdate) = 0
months_between(d,current_date) = 0
months_between(d,to_date('01-01-2010','DD-MM-YYYY')) = 0

e) NULL-related functions: they can all be offloaded.

-- List of the functions supporting offloading, is available through the V$SQLFN_METADATA view.
SELECT offloadable, count(DISTINCT name)
FROM v$sqlfn_metadata
GROUP BY offloadable;

-- Smart Scan Disablers
IOT
Clustered Tables
LOB
Rowdependencies Enabled

-- You can use the below query to see the "cell physical IO interconnect bytes returned by smart scan" in bytes
SELECT a.NAME, b.VALUE
FROM v$sysstat a, v$mystat b
WHERE a.statistic# = b.statistic#
AND ( a.NAME IN
('physical read total bytes', 'physical write total bytes',
'cell IO uncompressed bytes')
OR a.NAME LIKE 'cell physical%'
);


b).
-- How to create and drop Exadata Smart Flash Cache?


-- Drop flashcache (To flush Exadata Smart Flash Cache).
CellCLI> DROP FLASHCACHE

-- Create flashcache on all disks.
CellCLI> CREATE FLASHCACHE ALL

-- Flashcache cell disk and size deatils.
CellCLI> LIST FLASHCACHE detail
name: atlcel01_FLASHCACHE
cellDisk: FD_10_atlcel01,FD_00_atlcel01,FD_04_atlcel01,FD_14_atlcel01,FD_15_atlcel01,FD_02_atlcel01,FD_08_atlcel01,FD_07_atlcel01,FD_05_atlcel01,FD_09_atlcel01,FD_13_atlcel01,FD_03_atlcel01,FD_12_atlcel01,FD_01_atlcel01,FD_11_atlcel01,FD_06_atlcel01
creationTime: 2011-03-27T03:18:23-04:00
degradedCelldisks:
effectiveCacheSize: 365.25G
id: 40ef10f1-8c82-4419-b5a1-9d51706813fe
size: 365.25G
status: normal


-- To determine whether an object is currently being kept in the Exadata Smart Flash Cache

-- Get the object id
atldb01:ODS1\sys: SQL> select object_id, object_type from dba_objects where object_id = 8104207;

OBJECT_ID OBJECT_TYPE
---------- -------------------
8104207 TABLE PARTITION


-- Look at the segment statistics and get the DATAOBJ#
atldb01:ODS1\sys: SQL> select distinct inst_id, owner, object_name, ts#, obj#, dataobj# from gv$segment_statistics
2 where owner = 'PIN_USER'
3 and object_name = 'EVENT';

INST_ID OWNER OBJECT_NAME TS# OBJ# DATAOBJ#
---------- ------------------------------ ------------------------------ ---------- ---------- ----------
2 PIN_USER EVENT 197 8104207 8104207
1 PIN_USER EVENT 197 8104202 8104202
3 PIN_USER EVENT 197 8104201 8104201
2 PIN_USER EVENT 197 8104218 8104218
2 PIN_USER EVENT 197 8104221 8104221
2 PIN_USER EVENT 197 8104212 8104212
2 PIN_USER EVENT 197 8104215 8104215
2 PIN_USER EVENT 197 8104227 8104227
1 PIN_USER EVENT 197 8104218 8104218
1 PIN_USER EVENT 197 8104212 8104212
1 PIN_USER EVENT 197 8104205 8104205


-- List the object details from Flash Cache.
CellCLI> LIST FLASHCACHECONTENT where objectNumber=8104207 and tableSpaceNumber=197 and dbUniqueName=ODS detail
cachedKeepSize: 0
cachedSize: 4194304
dbID: 3108331340
dbUniqueName: ODS
hitCount: 5972
missCount: 8820
objectNumber: 8104207
tableSpaceNumber: 197

Note: DBA can cache an object by setting CELL_FLASH_CACHE setting to KEEP. The default value of this setting is DEFAULT.


When you are accessing the object data from the Flash Cache, you should see high number of optimized request("physical read requests optimized" and "cell flash cache read hits") than the high number of I/O requests("physical read total IO requests").

-- You can use the below query to see the "flash cache read hits" in bytes.
SELECT a.NAME, b.VALUE
FROM v$sysstat a, v$mystat b
WHERE a.statistic# = b.statistic#
AND ( a.NAME LIKE '%flash cache read hits'
OR a.NAME LIKE 'cell physical%'
OR a.NAME LIKE 'physical read tot%'
OR a.NAME LIKE 'physical read req%'
);


c).
-- Storage Index Capability's


Storage Indexes are memory structures which do not persist when the Exadata cells are restarted. They are dynamically built when tables are referenced for the first time after the cells restart.


-- You can use the below query to see the savings of Storage Index in MB.
SELECT a.NAME, b.VALUE / 1024 / 1024 value_in_mb
FROM v$sysstat a, v$mystat b
WHERE a.statistic# = b.statistic#
AND ( a.NAME IN
('physical read total bytes', 'physical write total bytes',
'cell IO uncompressed bytes')
OR a.NAME LIKE 'cell physical%'
);

Value of "cell physical IO bytes saved by storage index" shows the savings of the physical I/O by storage index and the total Physical I/O is recorded under "physical read total bytes".

Queries that benefit from storage index can execute more quickly using fewer resources which allows other workloads to use the unused I/O resources.

Sunday, July 10, 2011

How to setup notifications on cell node:

How to setup notifications on cell node:


-- Setting up notification on a cell
CellCLI> ALTER CELL smtpServer='smtpmail.corp.dwire.com', -
smtpFromAddr='OracleSupport@dwire.com', -
smtpFrom='DWIRE Oracle Support', -
smtpToAddr='OracleSupport@dwire.com,Vijay.Dumpa@dwire.com', -
notificationPolicy='critical,warning,clear', -
notificationMethod='mail,snmp'


-- List the notification details
CellCLI> list cell detail
name: atl02cel01
bmcType: IPMI
cellVersion: OSS_11.2.1.2.6_LINUX.X64_100511
cpuCount: 16
fanCount: 12/12
fanStatus: normal
id: 1009XFG023
interconnectCount: 3
interconnect1: bond0
iormBoost: 0.0
ipaddress1: 192.168.10.1/22
kernelVersion: 2.6.18-128.1.16.0.1.el5
makeModel: SUN MICROSYSTEMS SUN FIRE X4275 SERVER SAS
metricHistoryDays: 7
notificationMethod: mail,snmp
notificationPolicy: critical,warning,clear
offloadEfficiency: 3.6M
powerCount: 2/2
powerStatus: normal
smtpFrom: "DWIRE Oracle Support"
smtpFromAddr: OracleSupport@dwire.com
smtpServer: smtpmail.corp.dwire.com
smtpToAddr: OracleSupport@dwire.com,Vijay.Dumpa@dwire.com
snmpSubscriber: host=catl0sm41,port=162,community=celadmin
host=catl0sm58,port=162,community=celadmin
host=catl0sm59,port=162,community=celadmin
host=catl0sm157,port=162,community=celadmin
status: online
temperatureReading: 25.0
temperatureStatus: normal
upTime: 46 days, 19:18
cellsrvStatus: running
msStatus: running
rsStatus: running

CellCLI> exit
quitting


CellCLI> ALTER CELL VALIDATE mail -- A test message is sent using mail configuration.
CellCLI> ALTER CELL VALIDATE mail
Cell atl02cel01 successfully altered

-- Test email content, received in the outlook
Subject: CELL atl02cel01 Test Message
This test e-mail message from Oracle Cell atl02cel01 indicates successful configuration of your e-mail address and mail server.


CellCLI> ALTER CELL VALIDATE configuration -- To verify firmware configuration. (Takes a minute to return)
CellCLI> ALTER CELL VALIDATE configuration
Cell atl02cel01 successfully altered


Note: Need to execute on each cell node.

-- How to disable the notification alert?
cellcli -e alter cell notificationMethod=null

-- How to enable the notification alert back?
cellcli -e alter cell notificationMethod='mail,snmp'

Exadata Disk creation and Management:

Exadata Disk creation and Management:

Exadata Storage Layout:
PHYSICAL DISK -> LUN -> CELL DISK -> GRID DISK -> ASM DISK


a).
-- List LUN and PHYSICAL DISK.
CellCLI> LIST LUN -- List of all LUN's on a cell.
CellCLI> LIST LUN
0_0 0_0 normal
0_1 0_1 normal
0_2 0_2 normal
0_3 0_3 normal
0_4 0_4 normal
0_5 0_5 normal
0_6 0_6 normal
0_7 0_7 normal
0_8 0_8 normal
0_9 0_9 normal
0_10 0_10 normal
0_11 0_11 normal
1_0 1_0 normal
1_1 1_1 normal
1_2 1_2 normal
1_3 1_3 normal
2_0 2_0 normal
2_1 2_1 normal
2_2 2_2 normal
2_3 2_3 normal
4_0 4_0 normal
4_1 4_1 normal
4_2 4_2 normal
4_3 4_3 normal
5_0 5_0 normal
5_1 5_1 normal
5_2 5_2 normal
5_3 5_3 normal


CellCLI> LIST LUN where disktype = harddisk -- List all the hard disks on a cell.
CellCLI> LIST LUN where disktype = harddisk
0_0 0_0 normal
0_1 0_1 normal
0_2 0_2 normal
0_3 0_3 normal
0_4 0_4 normal
0_5 0_5 normal
0_6 0_6 normal
0_7 0_7 normal
0_8 0_8 normal
0_9 0_9 normal
0_10 0_10 normal
0_11 0_11 normal

CellCLI> LIST LUN where disktype = flashdisk -- List all the flash disks on a cell.
CellCLI> LIST LUN where disktype = flashdisk
1_0 1_0 normal
1_1 1_1 normal
1_2 1_2 normal
1_3 1_3 normal
2_0 2_0 normal
2_1 2_1 normal
2_2 2_2 normal
2_3 2_3 normal
4_0 4_0 normal
4_1 4_1 normal
4_2 4_2 normal
4_3 4_3 normal
5_0 5_0 normal
5_1 5_1 normal
5_2 5_2 normal
5_3 5_3 normal


CellCLI> LIST LUN where celldisk = null -- List all the LUN's not associated with a cell disk.
CellCLI> LIST LUN where celldisk = null


CellCLI> LIST LUN where name = 0_7 detail -- Check isSystemLun=FALSE, this indicates that LUN is not located on a system disk.
CellCLI> LIST LUN where name = 0_7 detail
name: 0_7
cellDisk: CD_07_atl02cel01
deviceName: /dev/sdh
diskType: HardDisk
id: 0_7
isSystemLun: FALSE
lunAutoCreate: FALSE
lunSize: 1861.712890625G
lunUID: 0_7
physicalDrives: 24:7
raidLevel: 0
lunWriteCacheMode: "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU"
status: normal


CellCLI> LIST PHYSICALDISK where name = 24:7 detail -- List the physical disk attributes (pass the "physicalDrives" name)
CellCLI> LIST PHYSICALDISK where name = 24:7 detail
name: 24:7
deviceId: 23
diskType: HardDisk
enclosureDeviceId: 24
errMediaCount: 0
errOtherCount: 0
foreignState: false
luns: 0_7
makeModel: "HITACHI H7220AA30SUN2.0T"
physicalFirmware: JKAOA28A
physicalInsertTime: 2010-07-13T21:22:00-04:00
physicalInterface: sata
physicalSerial: JK1130YAHBXL2T
physicalSize: 1862.6559999994934G
slotNumber: 7
status: normal


b).
-- How to create Cell disk?

CellCLI> LIST CELLDISK -- List all the cell disks on a cell node.
CellCLI> LIST CELLDISK
CD_00_atl02cel01 normal
CD_01_atl02cel01 normal
CD_02_atl02cel01 normal
CD_03_atl02cel01 normal
CD_04_atl02cel01 normal
CD_05_atl02cel01 normal
CD_06_atl02cel01 normal
CD_07_atl02cel01 normal
CD_08_atl02cel01 normal
CD_09_atl02cel01 normal
CD_10_atl02cel01 normal
CD_11_atl02cel01 normal
FD_00_atl02cel01 normal
FD_01_atl02cel01 normal
FD_02_atl02cel01 normal
FD_03_atl02cel01 normal
FD_04_atl02cel01 normal
FD_05_atl02cel01 normal
FD_06_atl02cel01 normal
FD_07_atl02cel01 normal
FD_08_atl02cel01 normal
FD_09_atl02cel01 normal
FD_10_atl02cel01 normal
FD_11_atl02cel01 normal
FD_12_atl02cel01 normal
FD_13_atl02cel01 normal
FD_14_atl02cel01 normal
FD_15_atl02cel01 normal

-- Hard Disk
CellCLI> LIST CELLDISK where name = CD_07_atl02cel01 detail
name: CD_07_atl02cel01
comment:
creationTime: 2010-07-14T19:56:09-04:00
deviceName: /dev/sdh
devicePartition: /dev/sdh
diskType: HardDisk
errorCount: 0
freeSpace: 560M
freeSpaceMap: offset=1832.046875G,size=560M
id: 00000129-d363-0901-0000-000000000000
interleaving: none
lun: 0_7
raidLevel: 0
size: 1861.703125G
status: normal

CellCLI> CREATE CELLDISK CD_07_atl02cel01 LUN='0_7' HARDDISK


-- Flash Disk
CellCLI> LIST CELLDISK where name = FD_00_atl02cel01 detail
name: FD_00_atl02cel01
comment:
creationTime: 2011-03-27T03:17:59-04:00
deviceName: /dev/sdq
devicePartition: /dev/sdq
diskType: FlashDisk
errorCount: 0
freeSpace: 0
id: caf0be8d-4061-451a-a9da-090945a9c8d5
interleaving: none
lun: 1_0
size: 22.875G
status: normal

CellCLI> CREATE CELLDISK FD_00_atl02cel01 LUN='1_0' FLASHDISK


c).
-- How to create Grid disk?
All the grid disks are created on cell disks (HARDDISK, not FLASHDISK).

CellCLI> LIST GRIDDISK -- List all the grid disks on a cell node.
CellCLI> LIST GRIDDISK
DATA_CD_00_atl02cel01 active
DATA_CD_01_atl02cel01 active
DATA_CD_02_atl02cel01 active
DATA_CD_03_atl02cel01 active
DATA_CD_04_atl02cel01 active
DATA_CD_05_atl02cel01 active
DATA_CD_06_atl02cel01 active
DATA_CD_07_atl02cel01 active
DATA_CD_08_atl02cel01 active
DATA_CD_09_atl02cel01 active
DATA_CD_10_atl02cel01 active
DATA_CD_11_atl02cel01 active
RECO_CD_00_atl02cel01 active
RECO_CD_01_atl02cel01 active
RECO_CD_02_atl02cel01 active
RECO_CD_03_atl02cel01 active
RECO_CD_04_atl02cel01 active
RECO_CD_05_atl02cel01 active
RECO_CD_06_atl02cel01 active
RECO_CD_07_atl02cel01 active
RECO_CD_08_atl02cel01 active
RECO_CD_09_atl02cel01 active
RECO_CD_10_atl02cel01 active
RECO_CD_11_atl02cel01 active
SYSTEMDG_CD_02_atl02cel01 active
SYSTEMDG_CD_03_atl02cel01 active
SYSTEMDG_CD_04_atl02cel01 active
SYSTEMDG_CD_05_atl02cel01 active
SYSTEMDG_CD_06_atl02cel01 active
SYSTEMDG_CD_07_atl02cel01 active
SYSTEMDG_CD_08_atl02cel01 active
SYSTEMDG_CD_09_atl02cel01 active
SYSTEMDG_CD_10_atl02cel01 active
SYSTEMDG_CD_11_atl02cel01 active

CellCLI> LIST GRIDDISK where celldisk=CD_07_atl02cel01 detail -- List the different grid disks names on one celldisk and the OFFSET is starting point on the physical disk.
CellCLI> LIST GRIDDISK where celldisk=CD_07_atl02cel01 detail
name: DATA_CD_07_atl02cel01
availableTo:
cellDisk: CD_07_atl02cel01
comment:
creationTime: 2010-07-14T19:58:53-04:00
diskType: HardDisk
errorCount: 0
id: 00000129-d365-88f0-0000-000000000000
offset: 32M
size: 1582G
status: active

name: RECO_CD_07_atl02cel01
availableTo:
cellDisk: CD_07_atl02cel01
comment:
creationTime: 2010-08-20T12:27:25-04:00
diskType: HardDisk
errorCount: 0
id: 0000012a-9053-7f16-0000-000000000000
offset: 1582G
size: 250G
status: active

name: SYSTEMDG_CD_07_atl02cel01
availableTo:
cellDisk: CD_07_atl02cel01
comment:
creationTime: 2010-07-14T19:56:50-04:00
diskType: HardDisk
errorCount: 0
id: 00000129-d363-a6e5-0000-000000000000
offset: 1832.59375G
size: 29.109375G
status: active

CellCLI> CREATE GRIDDISK DATA_CD_07_atl02cel01 celldisk=CD_07_atl02cel01,size=1582G,offset=32M -- Where "offset" is starting point on the disk.
CellCLI> CREATE GRIDDISK RECO_CD_07_atl02cel01 celldisk=CD_07_atl02cel01,size=250G,offset=1582G
CellCLI> CREATE GRIDDISK SYSTEMDG_CD_07_atl02cel01 celldisk=CD_07_atl02cel01,size=29.109375G,offset=1832.59375G



d).
-- How to create ASM disk on Exadata?

CellCLI> LIST GRIDDISK attributes name, size, asmmodestatus where asmmodestatus='UNUSED' -- To identify all the unused grid disks to create ASM disks.

asmca
(or)
. oraenv
+ASM1

sqlplus / as sysasm

set linesize 200
col path for a50

-- No CANDIDATE disks.
SQL> select name, header_status, path from v$asm_disk
2 where header_status <> 'MEMBER';

no rows selected

-- Print Existing disk name and the paths.
SQL> select * from (select name, header_status, path from v$asm_disk order by name)
2 where rownum < 16;

NAME HEADER_STATU PATH
------------------------- ------------ --------------------------------------------------
DATA_CD_00_ATL02CEL01 MEMBER o/192.168.10.1/DATA_CD_00_atl02cel01
DATA_CD_00_ATL02CEL02 MEMBER o/192.168.10.2/DATA_CD_00_atl02cel02
DATA_CD_00_ATL02CEL03 MEMBER o/192.168.10.3/DATA_CD_00_atl02cel03
DATA_CD_00_ATL02CEL04 MEMBER o/192.168.10.4/DATA_CD_00_atl02cel04
DATA_CD_00_ATL02CEL05 MEMBER o/192.168.10.5/DATA_CD_00_atl02cel05
DATA_CD_00_ATL02CEL06 MEMBER o/192.168.10.6/DATA_CD_00_atl02cel06
DATA_CD_00_ATL02CEL07 MEMBER o/192.168.10.7/DATA_CD_00_atl02cel07
DATA_CD_00_ATL02CEL08 MEMBER o/192.168.10.8/DATA_CD_00_atl02cel08
DATA_CD_00_ATL02CEL09 MEMBER o/192.168.10.9/DATA_CD_00_atl02cel09
DATA_CD_00_ATL02CEL10 MEMBER o/192.168.10.10/DATA_CD_00_atl02cel10
DATA_CD_01_ATL02CEL01 MEMBER o/192.168.10.1/DATA_CD_01_atl02cel01
DATA_CD_01_ATL02CEL02 MEMBER o/192.168.10.2/DATA_CD_01_atl02cel02
DATA_CD_01_ATL02CEL03 MEMBER o/192.168.10.3/DATA_CD_01_atl02cel03
DATA_CD_01_ATL02CEL04 MEMBER o/192.168.10.4/DATA_CD_01_atl02cel04
DATA_CD_01_ATL02CEL05 MEMBER o/192.168.10.5/DATA_CD_01_atl02cel05

15 rows selected.

col failgroup_type for a15;
SQL> select * from (select name, header_status, path, failgroup, failgroup_type, sector_size from v$asm_disk order by name)
2 where rownum < 16;

NAME HEADER_STATU PATH FAILGROUP FAILGROUP_TYPE SECTOR_SIZE
------------------------- ------------ --------------------------------------------- ----------- ------------------ -------------
DATA_CD_00_ATL02CEL01 MEMBER o/192.168.10.1/DATA_CD_00_atl02cel01 ATL02CEL01 REGULAR 512
DATA_CD_00_ATL02CEL02 MEMBER o/192.168.10.2/DATA_CD_00_atl02cel02 ATL02CEL02 REGULAR 512
DATA_CD_00_ATL02CEL03 MEMBER o/192.168.10.3/DATA_CD_00_atl02cel03 ATL02CEL03 REGULAR 512
DATA_CD_00_ATL02CEL04 MEMBER o/192.168.10.4/DATA_CD_00_atl02cel04 ATL02CEL04 REGULAR 512
DATA_CD_00_ATL02CEL05 MEMBER o/192.168.10.5/DATA_CD_00_atl02cel05 ATL02CEL05 REGULAR 512
DATA_CD_00_ATL02CEL06 MEMBER o/192.168.10.6/DATA_CD_00_atl02cel06 ATL02CEL06 REGULAR 512
DATA_CD_00_ATL02CEL07 MEMBER o/192.168.10.7/DATA_CD_00_atl02cel07 ATL02CEL07 REGULAR 512
DATA_CD_00_ATL02CEL08 MEMBER o/192.168.10.8/DATA_CD_00_atl02cel08 ATL02CEL08 REGULAR 512
DATA_CD_00_ATL02CEL09 MEMBER o/192.168.10.9/DATA_CD_00_atl02cel09 ATL02CEL09 REGULAR 512
DATA_CD_00_ATL02CEL10 MEMBER o/192.168.10.10/DATA_CD_00_atl02cel10 ATL02CEL10 REGULAR 512
DATA_CD_01_ATL02CEL01 MEMBER o/192.168.10.1/DATA_CD_01_atl02cel01 ATL02CEL01 REGULAR 512
DATA_CD_01_ATL02CEL02 MEMBER o/192.168.10.2/DATA_CD_01_atl02cel02 ATL02CEL02 REGULAR 512
DATA_CD_01_ATL02CEL03 MEMBER o/192.168.10.3/DATA_CD_01_atl02cel03 ATL02CEL03 REGULAR 512
DATA_CD_01_ATL02CEL04 MEMBER o/192.168.10.4/DATA_CD_01_atl02cel04 ATL02CEL04 REGULAR 512
DATA_CD_01_ATL02CEL05 MEMBER o/192.168.10.5/DATA_CD_01_atl02cel05 ATL02CEL05 REGULAR 512

15 rows selected.


col name for a10
col compatibility for a10
col database_compatibility for a22
SQL> select name, allocation_unit_size, block_size, compatibility, database_compatibility from v$asm_diskgroup;

NAME ALLOCATION_UNIT_SIZE BLOCK_SIZE COMPATIBIL DATABASE_COMPATIBILITY
---------- -------------------- ---------- ---------- ----------------------
SYSTEMDG 4194304 4096 11.2.0.0.0 11.2.0.0.0
DATA 4194304 4096 11.2.0.0.0 11.2.0.0.0
RECO 4194304 4096 11.2.0.0.0 11.2.0.0.0


. oraenv
+ASM1

sqlplus / as sysasm

-- Createing SYSTEMDG diskgroup:
CREATE DISKGROUP SYSTEMDG NORMAL REDUNDANCY
DISK 'o/*/SYSTEMDG*'
ATTRIBUTE 'AU_SIZE' = '4M',
'cell.smart_scan_capable'='TRUE',
'compatible.rdbms'='11.2.0.0',
'compatible.asm'='11.2.0.0';

-- Createing DATA diskgroup
CREATE DISKGROUP DATA NORMAL REDUNDANCY
DISK 'o/*/DATA*'
ATTRIBUTE 'AU_SIZE' = '4M',
'cell.smart_scan_capable'='TRUE',
'compatible.rdbms'='11.2.0.0',
'compatible.asm'='11.2.0.0';

-- Createing RECO diskgroup
CREATE DISKGROUP RECO NORMAL REDUNDANCY
DISK 'o/*/RECO*'
ATTRIBUTE 'AU_SIZE' = '4M',
'cell.smart_scan_capable'='TRUE',
'compatible.rdbms'='11.2.0.0',
'compatible.asm'='11.2.0.0';

Sunday, January 2, 2011

Exadata V2 Storage Server (Cell node) Architecture and Management:

Exadata V2 Storage Server (Cell node) Architecture and Management:

Architecture Overview:

Storage Servers:
The Exadata Storage Server is a SAN storage device specifically built for Oracle database use. Each holds 12 SAS or SATA disks (2 TB(SAS)/7 TB(SATA) total raw capacity),
dual Xeon CPUs, dual InfiniBand, and 384 GB of flash memory. Smart Scan support reduces the data that must travel over the InfiniBand network,
while Hybrid Columnar Compression reduces data footprint by 4 to 50 times, depending on the data and compression mode.

- Each Exadata storage cell has 12 disks.
- One physical disk is called LUN and is also called cell disk.
- A grid disk is part of a cell disk.
- A disk group is made of many grid disks.

- On the first two disks, the first 4 partitions (29GB) are reserved for system software. The two disks contain mirror copy.
- On the other 10 disks, 29GB can not be used. These 10 disks can be utilized by creating DBFS file system.
- So, Exdata creates a file system of 290GB called SYSTEMDG for FULL RAC cluster.
- SYSTEMDG contains OCR and Votedisk information.

ExadataCell Disk Storage Capacity:
. 12 x 600 GB SAS 15K rpm disks (7.2 TB/cell @ 100 TB total) - But actual size is about 558GB. A little bit of space is also kept aside for celldisk metadata ~48MB per disk.
. 12 x 2TB SATA disks (24 TB/cell @ 336 TB total)
. 4 x 96 GB Sun Flash Cards (384GB/cell @ total 5TB) - But when execute celci command 'list flashcache' returns 365.25.
- A little bit of space is also kept aside for celldisk metadata on these cards too.


Exadata Disk groups:
DATA (Data)
RECO (Redo logs, Archive logs and Flash Recovery Area)
SYSTEMDG (OCR and Votedisk)


Cell node unique feature:
Smart Scans
Hybrid Columnar Compression
Storage Indexes
Flash Cache
ExadataI/O Resource Management in Multi-Database Environment


Background Processes in the Exadata Cell Environment on database server:
The background processes for the database and Oracle ASM instance for an Exadata
Cell environment are the same as other environments, except for the following background process:

- diskmon Process - The diskmon process is a fundamental component of Exadata Cell, and is responsible for implementing I/O fencing.

- XDMG Process (Exadata Automation Manager)
Its primary task is to watch for inaccessible disks and cells, and to detect when the disks and cells become accessible.

- XDWK Process (Exadata Automation Worker)
The XDWK process begins when asynchronous actions, such as ONLINE, DROP or ADD for an Oracle ASM disk are requested by the XDMG process.
The XDWK process will stop after 5 minutes of inactivity.

Output:
> ps -ef | egrep "diskmon|xdmg|xdwk"
oracle 4684 4206 0 06:42 pts/1 00:00:00 egrep diskmon|xdmg|xdwk
oracle 10321 1 0 2010 ? 00:38:15 /u01/app/11.2.0/grid/bin/diskmon.bin -d -f
oracle 10858 1 0 2010 ? 00:00:18 asm_xdmg_+ASM1


As a departure from ASM storage technology that uses a process architecture borrowed from database instances,
the storage servers have a brand new set of processes to manage disk I/O. They are:

- RS, the restart service. Performing a similar role to SMON, RS monitors other processes, and automatically restarts them if they fail unexpectedly.
RS also handles planned restarts in conjunction with software updates.
The main cellrssrm process spawns several helper processes, including cellrsbmt, cellrsbkm, cellrsomt, and cellrsssmt.

- MS, the management service. MS is the back-end process that processes configuration and monitoring commands. It communicates with cellcli, described in the next section.
MS is written in Java, unlike the other background processes which are distributed in binary form and are likely written in C.

- CELLSRV, the cell service. CELLSRV handles the actual I/O processing of the storage server.
It is not uncommon to see heavy usage from CELLSRV process threads during periods of heavy load.
Among other things, CELLSRV provides:
. Communication with database nodes using the iDB/RDS protocols over the InfiniBand network
. Disk I/O with the underlying cell disks
. Offload of SQL processing from database nodes
. I/O resource management, prioritizing I/O requests based on a defined policy

- I/O Resource Manager (IORM). Enables storage grid by prioritizing I/Os to ensure predictable performance



Cell node Management Overview:
DBA's login as OS user "celladmin" to manage cell nodes.
Each cell node internally run ASM instance to manage cell node disks. This means, you can't see the ASM pmon process on the cell node.

Cell Admin. Tool's: cellcli and dcli.
Cell monitoring Tool's: OSWatcher, ORION (I/O performance benchmarking tool) and ADRCI


Cell Nodes Logs and Traces:
$ADR_BASE/diag/asm/cell/`hostname`/trace/alert.log
$ADR_BASE/diag/asm/cell/`hostname`/trace/ms-odl.*
$ADR_BASE/diag/asm/cell/`hostname`/trace/svtrc__0.trc -- ps -ef | grep "cellsrv 100"
$ADR_BASE/diag/asm/cell/`hostname`/incident/*

/var/log/messages*, dmesg
/var/log/sa/*
/var/log/cellos/*

cellcli -e list alerthistory

$OSSCONF/cellinit.ora -- #CELL Initialization Parameters
$OSSCONF/cell_disk_config.xml
$OSSCONF/griddisk.owners.dat
$OSSCONF/cell_bootstrap.ora

/opt/oracle/cell/cellsrv/deploy/log/cellcli.lst*

$OSSCONF/alerts.xml
$OSSCONF/metrics/*
oswatcher data

df -h -> check if /opt/oracle file system full? /opt/oracle only 2GB in size on cell node !!!

Where :
$OSSCONF is: /opt/oracle/cell11.2.1.3.1_LINUX.X64_100818.1/cellsrv/deploy/config
$ADR_BASE is: /opt/oracle/cell11.2.1.3.1_LINUX.X64_100818.1/log


Cell Check and shutdown/startup commands:
Note: For full list of commands use: cellcli -e help

cellcli -e alter cell shutdown services all
cellcli -e alter cell startup services all
cellcli -e alter cell shutdown services cellsrv
cellcli -e alter cell restart services cellsrv
cellcli -e list lun detail
cellcli -e list griddisk detail
cellcli -e list celldisk detail
cellcli -e list physicaldisk detail
cellcli -e list flashcache detail
cellcli -e list physicaldisk attributes name, diskType, luns, status
cellcli -e list physicaldisk where disktype=harddisk attributes physicalfirmware
cellcli -e list lun attributes name, diskType, isSystemLun, status

imagehistory (root/sudo)
imageinfo (root/sudo)
service celld status (root/sudo)
lsscsi | grep MARVELL


Smart scan layers:
Smart scan involves multiple layers of code
KDS/KTR/KCBL - data layers in rdbms
KCFIS - smart scan layer in rdbms
Predicate Disk - smart scan layer in cellsrv
Storage index - IO avoidance optimization in cellsrv
Flash IO - IO layer in cellsrv to fetch data from flash cache
Block IO - IO layer in cellsrv to fetch data from hard-disks
FPLIB - filtering library in cellsrv


How to Isolate the Issue Whether or Not it's Exadata Related?
Issue can be?
Wrong results when running the query on an Exadata DB -- I personally faced this issue.
Query is slower when running on an exadata database

Is smart scan issue?
Cell_offload_processing=false (default true)
If the problem does not occurs, it’s the smart scan issue.

Is this a FPLIB issue?
_kcfis_cell_passthru_enabled=true (default false)
If the problem does not occurs, it’s the FPLIB issue

Is storage index issue?
_kcfis_storageidx_disabled=true (default false)
Problem still occurs, it’s not a storage index issue.

Is flash cache issue?
For 11.2.0.2, _kcfis_keep_in_cellfc_enabled=false (default true) do not use flash cache
For 11.2.0.1, _kcfis_control1=1 (default 0)
Problem still occurs, it’s not a flash cache problem.

Cell related Database view's:
select * from sys.GV_$CELL_STATE;
select * from sys.GV_$CELL;
select * from sys.GV_$CELL_THREAD_HISTORY;
select * from sys.GV_$CELL_REQUEST_TOTALS;
select * from sys.GV_$CELL_CONFIG;


Bloom filter in Exadata:
In Oracle 10g concept of bloom filtering was introduced.
When two tables are joined via a hash join, the first table (typically the smaller table) is scanned and the rows that satisfy the ‘where’ clause predicates (for that table) are used to create a hash table.
During the hash table creation a bit vector or bloom filter is also created based on the join column.
The bit vector is then sent as an additional predicate to the second table scan.
After the ‘where’ clause predicates have been applied to the second table scan, the resulting rows will have their join column hashed and it will be compared to values in the bit vector.
If a match is found in the bit vector that row will be sent to the hash join. If no match is found then the row will be disregarded.
On Exadata the bloom filter or bit vector is passed as an additional predicate so it will be overloaded to the storage cells making bloom filtering very efficient.

How to Identify a Bloom Filter in an Execution plan:
You can identify a bloom filter in a plan when you see :BF0000 in the Name column of the execution plan.

To disable the feature, the initialization parameter _bloom_pruning_enabled must be set to FALSE.



Just want to say few words about other Exadata components too:

The Sun Oracle Exadata Database Machine hardware consists of preconfigured Oracle Database servers connected to Sun Oracle Exadata Storage Servers
by an InfiniBand fabric. Each of these has been configured to take advantage of the latest advances in Oracle database technology.

Database Servers:
Industry-standard Oracle Database 11gR2 servers feature advanced software that makes the extreme performance of Exadata possible.
Automatic Storage Management (ASM) provides advanced storage management capabilities; the Database Resource Manager (DBRM) lets users
prioritize the resources available to each database; and the Intelligent Database protocol (IDB) allows Smart Scan offloading of database queries
to the Storage Servers, greatly reducing network overhead.

InfiniBand switch:
At 40 Gbit/s on each port, the InfiniBand network linking the Exadata database servers to the Storage Servers is ten times as fast as Fibre Channel,
with lower latency. Multipathing protects against network failures.

Management switch:
A single Cisco Catalyst 4948 48-port gigabit Ethernet switch handles management traffic.

KVM and rack:
One 32-port Avocent KVM switch with associated keyboard/mouse drawer provides console access to database servers and storage cells.
The switch is IP-enabled, meaning remote console access is available either via the individual system ILOM ports or the KVM switch.
All the components are housed in a 42U Sun 1242E rack with integrated zero-U power distribution units.