OnTap read block size?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

OnTap read block size?

Rhodes, Richard L.

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: OnTap read block size?

Tim McCarthy
7-mode or cDOT? (presume 7-mode)

NFS or LUN (presume NFS)

Were the best practices followed when mounting ?
There is a NetApp library doc with the appropriate mount options for most UNIXs
Possibly even a KB

Is the NetApp configured for large read/write sizes? The default was like 32K and you can increase to 64K (65536)

What about using Oracle's NFS stack and by-passing the OS? This is MUCH FASTER!


--tmac

Tim McCarthy, Principal Consultant



On Wed, Jun 29, 2016 at 10:36 AM, Rhodes, Richard L. <[hidden email]> wrote:

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters



_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

RE: OnTap read block size?

Steiner, Jeffrey
In reply to this post by Rhodes, Richard L.

Is this NFS or FC?

 

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block size on the database, it should be reading in units of 64, not 128. Also, just because Oracle tries to read 1MB chunks doesn't mean the database can do that.

 

They really shouldn't be using cio as a mount option either. Any remotely current version of Oracle will mount the datafiles with concurrent IO so long as they have filesystemio_options=setall, which is also what they should have.

 

If you can send me a sample report from 'awrrpt.sql' of no more than one hour elapsed time from a period where they are unhappy with performance, I will take a look and what's going on. I can say with 100% certainty that if they really are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it could be a 16K block size on a badly fragmented jfs2 filesystem, but I really doubt it. I think something is being misinterpreted.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: [hidden email]
Subject: OnTap read block size?

 

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

RE: OnTap read block size?

Rhodes, Richard L.

I've asked a dba to look at your questions/comments.

 

I'm looking at a blog post http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/

 

It discusses how to read a STATIT for sequential I/O size.  I have a statit listing . . .

 

 

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs

/aggrfcp/plex0/rg0:

0b.01.0           54 107.88    0.00   2.11  2211   1.00  34.35   214   1.91  13.48   276 104.97  64.00   188   0.00   ....     .

0b.01.1           55 107.96    0.00   2.11  1684   1.13  30.56   216   1.86  12.90   347 104.97  64.00   192   0.00   ....     .

0b.01.10          56 111.70    4.14   4.76  1852   0.98  29.22   258   1.61   6.35   750 104.97  64.00   195   0.00   ....     .

0b.01.2           56 110.67    4.07   4.72  1814   0.65  43.40   192   0.98   9.70   565 104.97  64.00   200   0.00   ....     .

0b.01.3           56 110.75    4.16   4.72  1856   0.66  43.15   199   0.97  10.01   517 104.97  64.00   201   0.00   ....     .

0b.01.4           57 110.85    4.23   4.71  1751   0.65  42.99   194   1.00   9.96   517 104.97  64.00   206   0.00   ....     .

0b.01.5           57 110.62    4.06   4.97  1770   0.65  43.42   194   0.94  10.15   522 104.97  64.00   210   0.00   ....     .

0b.01.6           57 110.63    4.05   4.82  1764   0.65  43.55   197   0.96   9.83   562 104.97  64.00   210   0.00   ....     .

0b.01.7           57 110.73    4.12   4.61  1853   0.66  43.27   196   0.98   9.13   603 104.97  64.00   217   0.00   ....     .

0b.01.8           57 110.74    4.16   4.72  1844   0.65  43.54   197   0.95   9.18   583 104.97  64.00   218   0.00   ....     .

0b.01.9           57 110.75    4.16   4.76  1819   0.65  43.06   207   0.97   9.13   560 104.97  64.00   223   0.00   ....     .

 

This looks like it's doing sequential reads in 4k I/O's.

I have multiple of these listings and they are all the same.

 

 

rick

 

 

 

 

 

 

 

From: Steiner, Jeffrey [mailto:[hidden email]]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: RE: OnTap read block size?

 

Is this NFS or FC?

 

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block size on the database, it should be reading in units of 64, not 128. Also, just because Oracle tries to read 1MB chunks doesn't mean the database can do that.

 

They really shouldn't be using cio as a mount option either. Any remotely current version of Oracle will mount the datafiles with concurrent IO so long as they have filesystemio_options=setall, which is also what they should have.

 

If you can send me a sample report from 'awrrpt.sql' of no more than one hour elapsed time from a period where they are unhappy with performance, I will take a look and what's going on. I can say with 100% certainty that if they really are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it could be a 16K block size on a badly fragmented jfs2 filesystem, but I really doubt it. I think something is being misinterpreted.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: [hidden email]
Subject: OnTap read block size?

 

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: OnTap read block size?

Peter D. Gray
In reply to this post by Rhodes, Richard L.
On Wed, Jun 29, 2016 at 02:36:29PM +0000, Rhodes, Richard L. wrote:
> OnTap 8.1.2p1
>
> Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.
>
> On a AIX filesystem mounted CIO, if I run
>     "dd if=/dev/zero of=z bs=1m count=9999"
> I see writes of 500k.
>


I may be wrong, but my understanding is that with NFS there is no such think as "block size", at least
with V3. Each NFS request R/W request specifies the size and offset.

Unless its past EOF, a read will always return the number of bytes requested and a write will
transfer the bytes in the write request.

On solaris, any read by a process becomes a 32K NFS read (which I think
is the maximum supported by SUNRPC), and read ahead is
4 "blocks". So a single byte read results in 128K coming in via NFS, and of
course is cached on the client OS. Any read of up to 128K results in 128K transferred
from the server.

The size of block on the netapp which I think is 4K is irrelevant.

So, in short, its the local OS that controlls read/write to the NFS server.
The filer just does what its told. I

Please correct me if I am wrong.

Regards,
pdg

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: OnTap read block size?

jordan slingerland-2
In reply to this post by Rhodes, Richard L.

Please post the mount line from /etc/fstab or /etc/filesystems etc along with client os version

On Jun 29, 2016 10:44 AM, "Rhodes, Richard L." <[hidden email]> wrote:

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

RE: OnTap read block size?

Steiner, Jeffrey
In reply to this post by Rhodes, Richard L.

NFS behavior depends on the OS. For example, on Linux if the application tries to do a 1MB read and you have an rsize set to 65536 what happens is the OS issues 8 parallel 64KB requests. The ONTAP system will pick up what's happening and start doing read requests.

 

You are indeed showing 16KB IO requests here. The read chain is about 4, which means 4 times 4K blocks.

 

Are you certain that you don't just have a database with a 16KB block size and you're doing 16KB random reads? If this was sequential IO, the read chain should be a lot larger. I can't think of a realistic scenario where AIX would break a sequential IO operation into a series of 16KB reads by itself.

 

Here's a theory - is someone misreading Oracle IO stats? If you see activity that is primarily db_file_sequential_read, then everything is doing exactly what it's supposed to do because db_file_sequential_read is random IO. Depending on who you ask, it's either a random reads of an index sequence or a sequence of random IO operations. Either way, it's random IO, so if you see a database doing db_file_sequential_io and it has a 16KB block size, that would explain this.

 

Sequential IO is performed as either direct_path_read or db_file_scattered read. Yes, that means random is sequential and sequential is scattered. Everyone confused yet? Specifically, db_file_scattered_read is a large-block sequential IO operation that is loaded into scattered memory buffers.

 

I can't tell you how many times this has caused confusion for DBA's who are certainly their IO pattern is random and it's actually sequential or they think it's sequential and it's actually random.

 

Once you have the AWR we'll have a better idea what's happening. It's not just the IO sizes I'd be looking for, it's the associated latencies and some of the configuration files. If there's no explanation there, we'll have to look at the AIX configuration.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 9:07 PM
To: [hidden email]
Subject: RE: OnTap read block size?

 

I've asked a dba to look at your questions/comments.

 

I'm looking at a blog post http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/

 

It discusses how to read a STATIT for sequential I/O size.  I have a statit listing . . .

 

 

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs

/aggrfcp/plex0/rg0:

0b.01.0           54 107.88    0.00   2.11  2211   1.00  34.35   214   1.91  13.48   276 104.97  64.00   188   0.00   ....     .

0b.01.1           55 107.96    0.00   2.11  1684   1.13  30.56   216   1.86  12.90   347 104.97  64.00   192   0.00   ....     .

0b.01.10          56 111.70    4.14   4.76  1852   0.98  29.22   258   1.61   6.35   750 104.97  64.00   195   0.00   ....     .

0b.01.2           56 110.67    4.07   4.72  1814   0.65  43.40   192   0.98   9.70   565 104.97  64.00   200   0.00   ....     .

0b.01.3           56 110.75    4.16   4.72  1856   0.66  43.15   199   0.97  10.01   517 104.97  64.00   201   0.00   ....     .

0b.01.4           57 110.85    4.23   4.71  1751   0.65  42.99   194   1.00   9.96   517 104.97  64.00   206   0.00   ....     .

0b.01.5           57 110.62    4.06   4.97  1770   0.65  43.42   194   0.94  10.15   522 104.97  64.00   210   0.00   ....     .

0b.01.6           57 110.63    4.05   4.82  1764   0.65  43.55   197   0.96   9.83   562 104.97  64.00   210   0.00   ....     .

0b.01.7           57 110.73    4.12   4.61  1853   0.66  43.27   196   0.98   9.13   603 104.97  64.00   217   0.00   ....     .

0b.01.8           57 110.74    4.16   4.72  1844   0.65  43.54   197   0.95   9.18   583 104.97  64.00   218   0.00   ....     .

0b.01.9           57 110.75    4.16   4.76  1819   0.65  43.06   207   0.97   9.13   560 104.97  64.00   223   0.00   ....     .

 

This looks like it's doing sequential reads in 4k I/O's.

I have multiple of these listings and they are all the same.

 

 

rick

 

 

 

 

 

 

 

From: Steiner, Jeffrey [[hidden email]]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: RE: OnTap read block size?

 

Is this NFS or FC?

 

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block size on the database, it should be reading in units of 64, not 128. Also, just because Oracle tries to read 1MB chunks doesn't mean the database can do that.

 

They really shouldn't be using cio as a mount option either. Any remotely current version of Oracle will mount the datafiles with concurrent IO so long as they have filesystemio_options=setall, which is also what they should have.

 

If you can send me a sample report from 'awrrpt.sql' of no more than one hour elapsed time from a period where they are unhappy with performance, I will take a look and what's going on. I can say with 100% certainty that if they really are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it could be a 16K block size on a badly fragmented jfs2 filesystem, but I really doubt it. I think something is being misinterpreted.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: [hidden email]
Subject: OnTap read block size?

 

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: OnTap read block size?

Sebastian Goetze

Hi Rick,


in addition to what Jeff said:

What's going on with the GREADs? Is there a RAID-rebuild in progress?

That column should be 0 in normal circumstances and having this load in parallel to your DB load completely messes up the performance picture IMHO...


Oh, and the 'read_realloc' option on a volume with a "random write/sequential read" load often leads to nice performance improvements over time, dynamically optimizing the DB layout on disk and keeping the volume/file 'defragmented'.



Sebastian


On 6/30/2016 7:33 AM, Steiner, Jeffrey wrote:

NFS behavior depends on the OS. For example, on Linux if the application tries to do a 1MB read and you have an rsize set to 65536 what happens is the OS issues 8 parallel 64KB requests. The ONTAP system will pick up what's happening and start doing read requests.

 

You are indeed showing 16KB IO requests here. The read chain is about 4, which means 4 times 4K blocks.

 

Are you certain that you don't just have a database with a 16KB block size and you're doing 16KB random reads? If this was sequential IO, the read chain should be a lot larger. I can't think of a realistic scenario where AIX would break a sequential IO operation into a series of 16KB reads by itself.

 

Here's a theory - is someone misreading Oracle IO stats? If you see activity that is primarily db_file_sequential_read, then everything is doing exactly what it's supposed to do because db_file_sequential_read is random IO. Depending on who you ask, it's either a random reads of an index sequence or a sequence of random IO operations. Either way, it's random IO, so if you see a database doing db_file_sequential_io and it has a 16KB block size, that would explain this.

 

Sequential IO is performed as either direct_path_read or db_file_scattered read. Yes, that means random is sequential and sequential is scattered. Everyone confused yet? Specifically, db_file_scattered_read is a large-block sequential IO operation that is loaded into scattered memory buffers.

 

I can't tell you how many times this has caused confusion for DBA's who are certainly their IO pattern is random and it's actually sequential or they think it's sequential and it's actually random.

 

Once you have the AWR we'll have a better idea what's happening. It's not just the IO sizes I'd be looking for, it's the associated latencies and some of the configuration files. If there's no explanation there, we'll have to look at the AIX configuration.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 9:07 PM
To: [hidden email]
Subject: RE: OnTap read block size?

 

I've asked a dba to look at your questions/comments.

 

I'm looking at a blog post http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/

 

It discusses how to read a STATIT for sequential I/O size.  I have a statit listing . . .

 

 

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs

/aggrfcp/plex0/rg0:

0b.01.0           54 107.88    0.00   2.11  2211   1.00  34.35   214   1.91  13.48   276 104.97  64.00   188   0.00   ....     .

0b.01.1           55 107.96    0.00   2.11  1684   1.13  30.56   216   1.86  12.90   347 104.97  64.00   192   0.00   ....     .

0b.01.10          56 111.70    4.14   4.76  1852   0.98  29.22   258   1.61   6.35   750 104.97  64.00   195   0.00   ....     .

0b.01.2           56 110.67    4.07   4.72  1814   0.65  43.40   192   0.98   9.70   565 104.97  64.00   200   0.00   ....     .

0b.01.3           56 110.75    4.16   4.72  1856   0.66  43.15   199   0.97  10.01   517 104.97  64.00   201   0.00   ....     .

0b.01.4           57 110.85    4.23   4.71  1751   0.65  42.99   194   1.00   9.96   517 104.97  64.00   206   0.00   ....     .

0b.01.5           57 110.62    4.06   4.97  1770   0.65  43.42   194   0.94  10.15   522 104.97  64.00   210   0.00   ....     .

0b.01.6           57 110.63    4.05   4.82  1764   0.65  43.55   197   0.96   9.83   562 104.97  64.00   210   0.00   ....     .

0b.01.7           57 110.73    4.12   4.61  1853   0.66  43.27   196   0.98   9.13   603 104.97  64.00   217   0.00   ....     .

0b.01.8           57 110.74    4.16   4.72  1844   0.65  43.54   197   0.95   9.18   583 104.97  64.00   218   0.00   ....     .

0b.01.9           57 110.75    4.16   4.76  1819   0.65  43.06   207   0.97   9.13   560 104.97  64.00   223   0.00   ....     .

 

This looks like it's doing sequential reads in 4k I/O's.

I have multiple of these listings and they are all the same.

 

 

rick

 

 

 

 

 

 

 

From: Steiner, Jeffrey [[hidden email]]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: RE: OnTap read block size?

 

Is this NFS or FC?

 

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block size on the database, it should be reading in units of 64, not 128. Also, just because Oracle tries to read 1MB chunks doesn't mean the database can do that.

 

They really shouldn't be using cio as a mount option either. Any remotely current version of Oracle will mount the datafiles with concurrent IO so long as they have filesystemio_options=setall, which is also what they should have.

 

If you can send me a sample report from 'awrrpt.sql' of no more than one hour elapsed time from a period where they are unhappy with performance, I will take a look and what's going on. I can say with 100% certainty that if they really are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it could be a 16K block size on a badly fragmented jfs2 filesystem, but I really doubt it. I think something is being misinterpreted.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: [hidden email]
Subject: OnTap read block size?

 

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.



_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

RE: OnTap read block size?

Steiner, Jeffrey

In theory, if read_realloc was off and the aggregate was close to 100% full you could get this kind of IO pattern. I doubt that's happening, but I can't rule it out.

 

I did a test with an all-Flash system where I pretty much puréed an aggregate. In a healthy environment, everything should be nicely allocated and a sequential read operation should result in huge read chains, like 64x4K blocks read as a unit. I took an aggregate and filled it up to 100% and then ran about 72 hours of random overwrites. The end result was an array nothing was contiguous. All the 8K blocks were distributed randomly across all the disks. The read chains during sequential IO's were just 2. That would destroy performance on a system with spinning disk, but surprisingly it had no impact on my all-Flash system. Not a whit. That's why part of why there is no read_realloc on AFF systems at this time. It doesn't do anything useful.

 

I had to deliberately misconfigure the system to make that happen, though. I wouldn't expect a real-world environment to get into that situation.

 

From: Sebastian Goetze [mailto:[hidden email]]
Sent: Thursday, June 30, 2016 12:34 PM
To: Steiner, Jeffrey <[hidden email]>; Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: Re: OnTap read block size?

 

Hi Rick,

 

in addition to what Jeff said:

What's going on with the GREADs? Is there a RAID-rebuild in progress?

That column should be 0 in normal circumstances and having this load in parallel to your DB load completely messes up the performance picture IMHO...

 

Oh, and the 'read_realloc' option on a volume with a "random write/sequential read" load often leads to nice performance improvements over time, dynamically optimizing the DB layout on disk and keeping the volume/file 'defragmented'.

 

 

Sebastian

 

On 6/30/2016 7:33 AM, Steiner, Jeffrey wrote:

NFS behavior depends on the OS. For example, on Linux if the application tries to do a 1MB read and you have an rsize set to 65536 what happens is the OS issues 8 parallel 64KB requests. The ONTAP system will pick up what's happening and start doing read requests.

 

You are indeed showing 16KB IO requests here. The read chain is about 4, which means 4 times 4K blocks.

 

Are you certain that you don't just have a database with a 16KB block size and you're doing 16KB random reads? If this was sequential IO, the read chain should be a lot larger. I can't think of a realistic scenario where AIX would break a sequential IO operation into a series of 16KB reads by itself.

 

Here's a theory - is someone misreading Oracle IO stats? If you see activity that is primarily db_file_sequential_read, then everything is doing exactly what it's supposed to do because db_file_sequential_read is random IO. Depending on who you ask, it's either a random reads of an index sequence or a sequence of random IO operations. Either way, it's random IO, so if you see a database doing db_file_sequential_io and it has a 16KB block size, that would explain this.

 

Sequential IO is performed as either direct_path_read or db_file_scattered read. Yes, that means random is sequential and sequential is scattered. Everyone confused yet? Specifically, db_file_scattered_read is a large-block sequential IO operation that is loaded into scattered memory buffers.

 

I can't tell you how many times this has caused confusion for DBA's who are certainly their IO pattern is random and it's actually sequential or they think it's sequential and it's actually random.

 

Once you have the AWR we'll have a better idea what's happening. It's not just the IO sizes I'd be looking for, it's the associated latencies and some of the configuration files. If there's no explanation there, we'll have to look at the AIX configuration.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 9:07 PM
To: [hidden email]
Subject: RE: OnTap read block size?

 

I've asked a dba to look at your questions/comments.

 

I'm looking at a blog post http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/

 

It discusses how to read a STATIT for sequential I/O size.  I have a statit listing . . .

 

 

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs

/aggrfcp/plex0/rg0:

0b.01.0           54 107.88    0.00   2.11  2211   1.00  34.35   214   1.91  13.48   276 104.97  64.00   188   0.00   ....     .

0b.01.1           55 107.96    0.00   2.11  1684   1.13  30.56   216   1.86  12.90   347 104.97  64.00   192   0.00   ....     .

0b.01.10          56 111.70    4.14   4.76  1852   0.98  29.22   258   1.61   6.35   750 104.97  64.00   195   0.00   ....     .

0b.01.2           56 110.67    4.07   4.72  1814   0.65  43.40   192   0.98   9.70   565 104.97  64.00   200   0.00   ....     .

0b.01.3           56 110.75    4.16   4.72  1856   0.66  43.15   199   0.97  10.01   517 104.97  64.00   201   0.00   ....     .

0b.01.4           57 110.85    4.23   4.71  1751   0.65  42.99   194   1.00   9.96   517 104.97  64.00   206   0.00   ....     .

0b.01.5           57 110.62    4.06   4.97  1770   0.65  43.42   194   0.94  10.15   522 104.97  64.00   210   0.00   ....     .

0b.01.6           57 110.63    4.05   4.82  1764   0.65  43.55   197   0.96   9.83   562 104.97  64.00   210   0.00   ....     .

0b.01.7           57 110.73    4.12   4.61  1853   0.66  43.27   196   0.98   9.13   603 104.97  64.00   217   0.00   ....     .

0b.01.8           57 110.74    4.16   4.72  1844   0.65  43.54   197   0.95   9.18   583 104.97  64.00   218   0.00   ....     .

0b.01.9           57 110.75    4.16   4.76  1819   0.65  43.06   207   0.97   9.13   560 104.97  64.00   223   0.00   ....     .

 

This looks like it's doing sequential reads in 4k I/O's.

I have multiple of these listings and they are all the same.

 

 

rick

 

 

 

 

 

 

 

From: Steiner, Jeffrey [[hidden email]]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: RE: OnTap read block size?

 

Is this NFS or FC?

 

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block size on the database, it should be reading in units of 64, not 128. Also, just because Oracle tries to read 1MB chunks doesn't mean the database can do that.

 

They really shouldn't be using cio as a mount option either. Any remotely current version of Oracle will mount the datafiles with concurrent IO so long as they have filesystemio_options=setall, which is also what they should have.

 

If you can send me a sample report from 'awrrpt.sql' of no more than one hour elapsed time from a period where they are unhappy with performance, I will take a look and what's going on. I can say with 100% certainty that if they really are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it could be a 16K block size on a badly fragmented jfs2 filesystem, but I really doubt it. I think something is being misinterpreted.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: [hidden email]
Subject: OnTap read block size?

 

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.




_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

 


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

RE: OnTap read block size?

Rhodes, Richard L.

We've been doing a bunch of testing.  I was able to show that

it wasn't the NetApp storage system at fault.  When a NetApp

lun is put on another system it reads/writes big blocks just fine.

I thought this was a VIO problem fragmenting the I/O's.

 

The Unix/DBA folks now think it's highly fragmented AIX filesystems. 

They think the db files were copied once and were copied concurrently,

causing small jfs2 fragments.  So AIX sees a highly fragmented

filesystem and is issuing small block reads.

 

Regardless, the NetApp is not the problem!

 

 

 

Rick

 

 

 

 

 

rom: Steiner, Jeffrey [mailto:[hidden email]]
Sent: Thursday, June 30, 2016 7:05 AM
To: Sebastian Goetze <[hidden email]>; Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: RE: OnTap read block size?

 

In theory, if read_realloc was off and the aggregate was close to 100% full you could get this kind of IO pattern. I doubt that's happening, but I can't rule it out.

 

I did a test with an all-Flash system where I pretty much puréed an aggregate. In a healthy environment, everything should be nicely allocated and a sequential read operation should result in huge read chains, like 64x4K blocks read as a unit. I took an aggregate and filled it up to 100% and then ran about 72 hours of random overwrites. The end result was an array nothing was contiguous. All the 8K blocks were distributed randomly across all the disks. The read chains during sequential IO's were just 2. That would destroy performance on a system with spinning disk, but surprisingly it had no impact on my all-Flash system. Not a whit. That's why part of why there is no read_realloc on AFF systems at this time. It doesn't do anything useful.

 

I had to deliberately misconfigure the system to make that happen, though. I wouldn't expect a real-world environment to get into that situation.

 

From: Sebastian Goetze [[hidden email]]
Sent: Thursday, June 30, 2016 12:34 PM
To: Steiner, Jeffrey <[hidden email]>; Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: Re: OnTap read block size?

 

Hi Rick,

 

in addition to what Jeff said:

What's going on with the GREADs? Is there a RAID-rebuild in progress?

That column should be 0 in normal circumstances and having this load in parallel to your DB load completely messes up the performance picture IMHO...

 

Oh, and the 'read_realloc' option on a volume with a "random write/sequential read" load often leads to nice performance improvements over time, dynamically optimizing the DB layout on disk and keeping the volume/file 'defragmented'.

 

 

Sebastian

 

On 6/30/2016 7:33 AM, Steiner, Jeffrey wrote:

NFS behavior depends on the OS. For example, on Linux if the application tries to do a 1MB read and you have an rsize set to 65536 what happens is the OS issues 8 parallel 64KB requests. The ONTAP system will pick up what's happening and start doing read requests.

 

You are indeed showing 16KB IO requests here. The read chain is about 4, which means 4 times 4K blocks.

 

Are you certain that you don't just have a database with a 16KB block size and you're doing 16KB random reads? If this was sequential IO, the read chain should be a lot larger. I can't think of a realistic scenario where AIX would break a sequential IO operation into a series of 16KB reads by itself.

 

Here's a theory - is someone misreading Oracle IO stats? If you see activity that is primarily db_file_sequential_read, then everything is doing exactly what it's supposed to do because db_file_sequential_read is random IO. Depending on who you ask, it's either a random reads of an index sequence or a sequence of random IO operations. Either way, it's random IO, so if you see a database doing db_file_sequential_io and it has a 16KB block size, that would explain this.

 

Sequential IO is performed as either direct_path_read or db_file_scattered read. Yes, that means random is sequential and sequential is scattered. Everyone confused yet? Specifically, db_file_scattered_read is a large-block sequential IO operation that is loaded into scattered memory buffers.

 

I can't tell you how many times this has caused confusion for DBA's who are certainly their IO pattern is random and it's actually sequential or they think it's sequential and it's actually random.

 

Once you have the AWR we'll have a better idea what's happening. It's not just the IO sizes I'd be looking for, it's the associated latencies and some of the configuration files. If there's no explanation there, we'll have to look at the AIX configuration.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 9:07 PM
To: [hidden email]
Subject: RE: OnTap read block size?

 

I've asked a dba to look at your questions/comments.

 

I'm looking at a blog post http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/

 

It discusses how to read a STATIT for sequential I/O size.  I have a statit listing . . .

 

 

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs

/aggrfcp/plex0/rg0:

0b.01.0           54 107.88    0.00   2.11  2211   1.00  34.35   214   1.91  13.48   276 104.97  64.00   188   0.00   ....     .

0b.01.1           55 107.96    0.00   2.11  1684   1.13  30.56   216   1.86  12.90   347 104.97  64.00   192   0.00   ....     .

0b.01.10          56 111.70    4.14   4.76  1852   0.98  29.22   258   1.61   6.35   750 104.97  64.00   195   0.00   ....     .

0b.01.2           56 110.67    4.07   4.72  1814   0.65  43.40   192   0.98   9.70   565 104.97  64.00   200   0.00   ....     .

0b.01.3           56 110.75    4.16   4.72  1856   0.66  43.15   199   0.97  10.01   517 104.97  64.00   201   0.00   ....     .

0b.01.4           57 110.85    4.23   4.71  1751   0.65  42.99   194   1.00   9.96   517 104.97  64.00   206   0.00   ....     .

0b.01.5           57 110.62    4.06   4.97  1770   0.65  43.42   194   0.94  10.15   522 104.97  64.00   210   0.00   ....     .

0b.01.6           57 110.63    4.05   4.82  1764   0.65  43.55   197   0.96   9.83   562 104.97  64.00   210   0.00   ....     .

0b.01.7           57 110.73    4.12   4.61  1853   0.66  43.27   196   0.98   9.13   603 104.97  64.00   217   0.00   ....     .

0b.01.8           57 110.74    4.16   4.72  1844   0.65  43.54   197   0.95   9.18   583 104.97  64.00   218   0.00   ....     .

0b.01.9           57 110.75    4.16   4.76  1819   0.65  43.06   207   0.97   9.13   560 104.97  64.00   223   0.00   ....     .

 

This looks like it's doing sequential reads in 4k I/O's.

I have multiple of these listings and they are all the same.

 

 

rick

 

 

 

 

 

 

 

From: Steiner, Jeffrey [[hidden email]]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. <[hidden email]>; [hidden email]
Subject: RE: OnTap read block size?

 

Is this NFS or FC?

 

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block size on the database, it should be reading in units of 64, not 128. Also, just because Oracle tries to read 1MB chunks doesn't mean the database can do that.

 

They really shouldn't be using cio as a mount option either. Any remotely current version of Oracle will mount the datafiles with concurrent IO so long as they have filesystemio_options=setall, which is also what they should have.

 

If you can send me a sample report from 'awrrpt.sql' of no more than one hour elapsed time from a period where they are unhappy with performance, I will take a look and what's going on. I can say with 100% certainty that if they really are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it could be a 16K block size on a badly fragmented jfs2 filesystem, but I really doubt it. I think something is being misinterpreted.

 

From: [hidden email] [[hidden email]] On Behalf Of Rhodes, Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: [hidden email]
Subject: OnTap read block size?

 

OnTap 8.1.2p1

 

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be reading/returning 16k at a time.

 

On a AIX filesystem mounted CIO, if I run

    "dd if=/dev/zero of=z bs=1m count=9999"

I see writes of 500k. 

 

In the same filesystem mounted CIO, if I read an existing db file

  "dd if=<dbfile> of=/dev/null bs=1m"

I see reads of up to 30k.

 

 

Q) Is there a limit in OnTap on read size?

 

 

Thanks

 

Rick

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.





_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

 



The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters