Atomicity of rename on NFS

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Atomicity of rename on NFS

Edward Rolison
Hello fellow NetApp Admins. 
I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).

What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one. 

The expectation is that this file will always be there - because "rename()" is defined as an atomic operation. 

But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'. 

This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. 

So what I'm trying to figure out is first off:

- Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all)

- Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64  - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions). 

- Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think?

This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.




_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: Atomicity of rename on NFS

Steiner, Jeffrey
That sounds like normal behavior with the typical mount options used for NFS.     What are you using exactly? The default includes several seconds of caching of file and directory data. The act of renaming a file is atomic but other NFS clients will not be immediately aware of the change unless you have actimeo=0 and noac in the mount options. There are performance consequences for that but sometimes it's unavoidable. For example, Oracle database clusters using NFS must always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac. 

Sent from my mobile phone. 

On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]> wrote:

Hello fellow NetApp Admins. 
I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).

What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one. 

The expectation is that this file will always be there - because "rename()" is defined as an atomic operation. 

But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'. 

This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. 

So what I'm trying to figure out is first off:

- Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all)

- Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64  - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions). 

- Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think?

This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.



_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: Atomicity of rename on NFS

Edward Rolison
I'm pretty much on default options. (So will be caching attributes)

So is this likely a question of attribute caching on the reading client? It's caching an old view of the directory, such that when it issues an open() the file it's targeting doesn't exist any more, despite there being a file in the right place that was replaced with rename()? I'm a little surprised I don't hit this more often then, since I couldn't reproduce with a tight loop on a pair of hosts. 

On 28 December 2016 at 12:49, Steiner, Jeffrey <[hidden email]> wrote:
That sounds like normal behavior with the typical mount options used for NFS.     What are you using exactly? The default includes several seconds of caching of file and directory data. The act of renaming a file is atomic but other NFS clients will not be immediately aware of the change unless you have actimeo=0 and noac in the mount options. There are performance consequences for that but sometimes it's unavoidable. For example, Oracle database clusters using NFS must always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac. 

Sent from my mobile phone. 

On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]> wrote:

Hello fellow NetApp Admins. 
I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).

What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one. 

The expectation is that this file will always be there - because "rename()" is defined as an atomic operation. 

But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'. 

This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. 

So what I'm trying to figure out is first off:

- Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all)

- Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64  - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions). 

- Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think?

This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.



_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

RE: Atomicity of rename on NFS

andrei.borzenkov@ts.fujitsu.com
In reply to this post by Steiner, Jeffrey

I would expect “Stale NFS handle” if the problem was (another) client caching. But it looks like (another) client actually contacts server and gets “No such file” in response. Multiple resources on Net suggest that it is known NFS limitation.

 

I can think of at least one case when it is possible – if target file is currently opened on the same client that is doing rename, client is expected to rename target to .nfsXXXX to prevent deletion on server which opens up window when target file is not available.

 

@Edward, do you see any .nfsXXXX files in the same directory?

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Steiner, Jeffrey
Sent: Wednesday, December 28, 2016 3:49 PM
To: Edward Rolison
Cc: [hidden email]
Subject: Re: Atomicity of rename on NFS

 

That sounds like normal behavior with the typical mount options used for NFS.     What are you using exactly? The default includes several seconds of caching of file and directory data. The act of renaming a file is atomic but other NFS clients will not be immediately aware of the change unless you have actimeo=0 and noac in the mount options. There are performance consequences for that but sometimes it's unavoidable. For example, Oracle database clusters using NFS must always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac. 

Sent from my mobile phone. 


On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]> wrote:

Hello fellow NetApp Admins. 
I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).

 

What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one. 

 

The expectation is that this file will always be there - because "rename()" is defined as an atomic operation. 

 

But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'. 

 

This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. 

 

So what I'm trying to figure out is first off:

- Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all)

 

- Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64  - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions). 

 

- Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think?



This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.



 



_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

RE: Atomicity of rename on NFS

Steiner, Jeffrey
In reply to this post by Edward Rolison

Now that I think about it, it is surprisingly rare. I've run into this from time to time, though. A file rename or a removal on one client will cause a few "no such file" messages an another client doing a tar operation or something similar.

 

What's happened is one client cached the contents of a directory. A particular file is renamed elsewhere, and when the client tries to open a file for reading or writing, it tries to acquire a file handle and only then discovers the file no longer exists.

 

From: Edward Rolison [mailto:[hidden email]]
Sent: Wednesday, December 28, 2016 2:00 PM
To: Steiner, Jeffrey <[hidden email]>
Cc: [hidden email]
Subject: Re: Atomicity of rename on NFS

 

I'm pretty much on default options. (So will be caching attributes)

 

So is this likely a question of attribute caching on the reading client? It's caching an old view of the directory, such that when it issues an open() the file it's targeting doesn't exist any more, despite there being a file in the right place that was replaced with rename()? I'm a little surprised I don't hit this more often then, since I couldn't reproduce with a tight loop on a pair of hosts. 

 

On 28 December 2016 at 12:49, Steiner, Jeffrey <[hidden email]> wrote:

That sounds like normal behavior with the typical mount options used for NFS.     What are you using exactly? The default includes several seconds of caching of file and directory data. The act of renaming a file is atomic but other NFS clients will not be immediately aware of the change unless you have actimeo=0 and noac in the mount options. There are performance consequences for that but sometimes it's unavoidable. For example, Oracle database clusters using NFS must always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac. 

Sent from my mobile phone. 


On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]> wrote:

Hello fellow NetApp Admins. 
I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).

 

What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one. 

 

The expectation is that this file will always be there - because "rename()" is defined as an atomic operation. 

 

But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'. 

 

This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. 

 

So what I'm trying to figure out is first off:

- Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all)

 

- Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64  - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions). 

 

- Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think?

 

This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.

 

 

 

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

 


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: Atomicity of rename on NFS

Michael Bergman
In reply to this post by andrei.borzenkov@ts.fujitsu.com
Just a comment: race conditions over NFS can be common and severe if you
construct a piece of SW (a system) which inherently assumes perfect [client]
Cache Coherence.  This is not really possible to achieve, and client side
caching is VERY important, yes crucial, for performance of any distributed
file system like NFS is.

(Try Google for "perfect cache coherence" file system and look at the hits
you will get.)

The .nfsXXXX files are residues from such a race conditions in the case that
one client had a file open (an active File Handle) and writes to it, and
another client just deletes that file. When the W data is flushed from the
1st client, the file is gone and that data goes into the .nfsXXXX file in
the dir where client #2 expected the file to be.

It is, unfortunately, quite common that people have totally misunderstood
the semantics of UNIX and NFS in this respect. Many really believe that if
one NFS client has a file open, then no other client can delete it. So they
have no idea what "Stale NFS file handle" means and how easy it is to end up
in that situation if you work with a parallel system (home brew as it often
is) over NFS with many NFS clients involved. It seems easy and
straightforward, but it is not.

There is no mandatory file locking in NFS. Never has been. It's advvisory
and also before NFSv4.x auxiliary (the NLM system, with its own ports, it
doesn't have very high performance capacity).

If you don't know EXACTLY what you're doing, you will shoot yourself in the
foot.

Regards,
/M


On 2016-12-28 14:10, [hidden email] wrote:

> I would expect “Stale NFS handle” if the problem was (another) client
> caching. But it looks like (another) client actually contacts server and
> gets “No such file” in response. Multiple resources on Net suggest that it
> is known NFS limitation.
>
> I can think of at least one case when it is possible – if target file is
> currently opened on the same client that is doing rename, client is expected
> to rename target to .nfsXXXX to prevent deletion on server which opens up
> window when target file is not available.
>
> @Edward, do you see any .nfsXXXX files in the same directory?
>
> *From:*[hidden email] [mailto:[hidden email]]
> *On Behalf Of *Steiner, Jeffrey
> *Sent:* Wednesday, December 28, 2016 3:49 PM
> *To:* Edward Rolison
> *Cc:* [hidden email]
> *Subject:* Re: Atomicity of rename on NFS
>
> That sounds like normal behavior with the typical mount options used for
> NFS. What are you using exactly? The default includes several seconds of
> caching of file and directory data. The act of renaming a file is atomic but
> other NFS clients will not be immediately aware of the change unless you
> have actimeo=0 and noac in the mount options. There are performance
> consequences for that but sometimes it's unavoidable. For example, Oracle
> database clusters using NFS must always have a single consistent image of
> them data across notes. That's why they use actimeo=0 and noac.
>
> Sent from my mobile phone.
>
>
> On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> Hello fellow NetApp Admins.
> I have a bit of an odd one that I'm trying to troubleshoot - and whilst
> I'm not sure it's specifically filer related, it's NFS related (and is
> happening on a filer mount).
>
> What happens is this - there's a process that updates a file, and relies
> on 'rename()' being atomic- a journal is updated, and then reference
> pointer (file) is newly created, and renamed over an old one.
>
> The expectation is that this file will always be there - because
> "rename()" is defined as an atomic operation.
>
> But that's not quite what I'm getting - I have one nfs client doing it's
> (atomic) rename. And another client (different NFS host) reading it, and
> - occasionally - reporting 'no such file or directory'.
>
> This is causing an operation to fail, which in turn means that someone
> has to intervene in the process. This operation (and multiple extremely
> similar ones) happen at 5m intervals, and every few days (once a week
> maybe?) it fails for this reason, and our developers think that should be
> impossible. But as such - it looks like a pretty narrow race condition.
> [...]
_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

AW: Atomicity of rename on NFS

Alexander Griesser-2
Yah, that's a PITA sometimes.
Anyways, customers usually tend to ignore these facts hence why we're sometimes providing a small additional NFS volume to them with no client side caching for semaphores or other "realtime" like files, whereas the main application resides on an NFS share with caching being active.

Best,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: [hidden email]
Web: http://www.anexia-it.com 

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
Geschäftsführer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601


-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Michael Bergman
Gesendet: Donnerstag, 29. Dezember 2016 12:07
An: Toasters <[hidden email]>
Betreff: Re: Atomicity of rename on NFS

Just a comment: race conditions over NFS can be common and severe if you construct a piece of SW (a system) which inherently assumes perfect [client] Cache Coherence.  This is not really possible to achieve, and client side caching is VERY important, yes crucial, for performance of any distributed file system like NFS is.

(Try Google for "perfect cache coherence" file system and look at the hits you will get.)

The .nfsXXXX files are residues from such a race conditions in the case that one client had a file open (an active File Handle) and writes to it, and another client just deletes that file. When the W data is flushed from the 1st client, the file is gone and that data goes into the .nfsXXXX file in the dir where client #2 expected the file to be.

It is, unfortunately, quite common that people have totally misunderstood the semantics of UNIX and NFS in this respect. Many really believe that if one NFS client has a file open, then no other client can delete it. So they have no idea what "Stale NFS file handle" means and how easy it is to end up in that situation if you work with a parallel system (home brew as it often
is) over NFS with many NFS clients involved. It seems easy and straightforward, but it is not.

There is no mandatory file locking in NFS. Never has been. It's advvisory and also before NFSv4.x auxiliary (the NLM system, with its own ports, it doesn't have very high performance capacity).

If you don't know EXACTLY what you're doing, you will shoot yourself in the foot.

Regards,
/M


On 2016-12-28 14:10, [hidden email] wrote:

> I would expect "Stale NFS handle" if the problem was (another) client
> caching. But it looks like (another) client actually contacts server
> and gets "No such file" in response. Multiple resources on Net suggest
> that it is known NFS limitation.
>
> I can think of at least one case when it is possible - if target file
> is currently opened on the same client that is doing rename, client is
> expected to rename target to .nfsXXXX to prevent deletion on server
> which opens up window when target file is not available.
>
> @Edward, do you see any .nfsXXXX files in the same directory?
>
> *From:*[hidden email]
> [mailto:[hidden email]]
> *On Behalf Of *Steiner, Jeffrey
> *Sent:* Wednesday, December 28, 2016 3:49 PM
> *To:* Edward Rolison
> *Cc:* [hidden email]
> *Subject:* Re: Atomicity of rename on NFS
>
> That sounds like normal behavior with the typical mount options used
> for NFS. What are you using exactly? The default includes several
> seconds of caching of file and directory data. The act of renaming a
> file is atomic but other NFS clients will not be immediately aware of
> the change unless you have actimeo=0 and noac in the mount options.
> There are performance consequences for that but sometimes it's
> unavoidable. For example, Oracle database clusters using NFS must
> always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac.
>
> Sent from my mobile phone.
>
>
> On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> Hello fellow NetApp Admins.
> I have a bit of an odd one that I'm trying to troubleshoot - and
> whilst I'm not sure it's specifically filer related, it's NFS related
> (and is happening on a filer mount).
>
> What happens is this - there's a process that updates a file, and
> relies on 'rename()' being atomic- a journal is updated, and then
> reference pointer (file) is newly created, and renamed over an old one.
>
> The expectation is that this file will always be there - because
> "rename()" is defined as an atomic operation.
>
> But that's not quite what I'm getting - I have one nfs client doing
> it's
> (atomic) rename. And another client (different NFS host) reading it,
> and
> - occasionally - reporting 'no such file or directory'.
>
> This is causing an operation to fail, which in turn means that someone
> has to intervene in the process. This operation (and multiple
> extremely similar ones) happen at 5m intervals, and every few days
> (once a week
> maybe?) it fails for this reason, and our developers think that should
> be impossible. But as such - it looks like a pretty narrow race condition.
> [...]
_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: Atomicity of rename on NFS

Michael Bergman
In reply to this post by Michael Bergman
One more comment on caching in NFS.

On 2016-12-29 12:06, I wrote:
> Just a comment: race conditions over NFS can be common and severe if you
> construct a piece of SW (a system) which inherently assumes perfect [client]
> Cache Coherence. This is not really possible to achieve, and client side
> caching is VERY important, yes crucial, for performance of any distributed
> file system like NFS is.

The reason Oracle related NFSv3 mounts can work and perform with client side
attr caching turned off (actimeo=0 / noac), is that Oracle controls things
internally itself and caches states and data ways that satisfy its
particular specific workload patterns over NFS.

So Oracle takes care of the performance in its own way, one can say. The NFS
client doesn't have to do it.

(Oracle even has its own NFS client stack one can use, it's called something
like DNFS I think, I may remember wrong)

/M

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: Atomicity of rename on NFS

Edward Rolison
In reply to this post by Alexander Griesser-2
Agreed - NFS is problematic for all sorts of reasons when trying to use it as a parallel data store. It has worked to date, and the update to our NetApp infrastructure has acted as a catalyst.
If the answer is "we were relying on an assumption that wasn't true" then that would be acceptable, provided I can pull together some solid ammunition. 
Or indeed if 'rename is atomic to the client()' only applies to the client issuing the operation, and remote client behaviour is implementation specific (and undefined). 




On 29 December 2016 at 11:11, Alexander Griesser <[hidden email]> wrote:
Yah, that's a PITA sometimes.
Anyways, customers usually tend to ignore these facts hence why we're sometimes providing a small additional NFS volume to them with no client side caching for semaphores or other "realtime" like files, whereas the main application resides on an NFS share with caching being active.

Best,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: [hidden email]
Web: http://www.anexia-it.com

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
Geschäftsführer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601


-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Michael Bergman
Gesendet: Donnerstag, 29. Dezember 2016 12:07
An: Toasters <[hidden email]>
Betreff: Re: Atomicity of rename on NFS

Just a comment: race conditions over NFS can be common and severe if you construct a piece of SW (a system) which inherently assumes perfect [client] Cache Coherence.  This is not really possible to achieve, and client side caching is VERY important, yes crucial, for performance of any distributed file system like NFS is.

(Try Google for "perfect cache coherence" file system and look at the hits you will get.)

The .nfsXXXX files are residues from such a race conditions in the case that one client had a file open (an active File Handle) and writes to it, and another client just deletes that file. When the W data is flushed from the 1st client, the file is gone and that data goes into the .nfsXXXX file in the dir where client #2 expected the file to be.

It is, unfortunately, quite common that people have totally misunderstood the semantics of UNIX and NFS in this respect. Many really believe that if one NFS client has a file open, then no other client can delete it. So they have no idea what "Stale NFS file handle" means and how easy it is to end up in that situation if you work with a parallel system (home brew as it often
is) over NFS with many NFS clients involved. It seems easy and straightforward, but it is not.

There is no mandatory file locking in NFS. Never has been. It's advvisory and also before NFSv4.x auxiliary (the NLM system, with its own ports, it doesn't have very high performance capacity).

If you don't know EXACTLY what you're doing, you will shoot yourself in the foot.

Regards,
/M


On 2016-12-28 14:10, [hidden email] wrote:
> I would expect "Stale NFS handle" if the problem was (another) client
> caching. But it looks like (another) client actually contacts server
> and gets "No such file" in response. Multiple resources on Net suggest
> that it is known NFS limitation.
>
> I can think of at least one case when it is possible - if target file
> is currently opened on the same client that is doing rename, client is
> expected to rename target to .nfsXXXX to prevent deletion on server
> which opens up window when target file is not available.
>
> @Edward, do you see any .nfsXXXX files in the same directory?
>
> *From:*[hidden email]
> [mailto:[hidden email]]
> *On Behalf Of *Steiner, Jeffrey
> *Sent:* Wednesday, December 28, 2016 3:49 PM
> *To:* Edward Rolison
> *Cc:* [hidden email]
> *Subject:* Re: Atomicity of rename on NFS
>
> That sounds like normal behavior with the typical mount options used
> for NFS. What are you using exactly? The default includes several
> seconds of caching of file and directory data. The act of renaming a
> file is atomic but other NFS clients will not be immediately aware of
> the change unless you have actimeo=0 and noac in the mount options.
> There are performance consequences for that but sometimes it's
> unavoidable. For example, Oracle database clusters using NFS must
> always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac.
>
> Sent from my mobile phone.
>
>
> On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> Hello fellow NetApp Admins.
> I have a bit of an odd one that I'm trying to troubleshoot - and
> whilst I'm not sure it's specifically filer related, it's NFS related
> (and is happening on a filer mount).
>
> What happens is this - there's a process that updates a file, and
> relies on 'rename()' being atomic- a journal is updated, and then
> reference pointer (file) is newly created, and renamed over an old one.
>
> The expectation is that this file will always be there - because
> "rename()" is defined as an atomic operation.
>
> But that's not quite what I'm getting - I have one nfs client doing
> it's
> (atomic) rename. And another client (different NFS host) reading it,
> and
> - occasionally - reporting 'no such file or directory'.
>
> This is causing an operation to fail, which in turn means that someone
> has to intervene in the process. This operation (and multiple
> extremely similar ones) happen at 5m intervals, and every few days
> (once a week
> maybe?) it fails for this reason, and our developers think that should
> be impossible. But as such - it looks like a pretty narrow race condition.
> [...]
_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: AW: Atomicity of rename on NFS

Michael Bergman
In reply to this post by Alexander Griesser-2
What you describe below (about a special NFS mount) is not a bad idea IMO,
but it still assumes the users understand what rendez-vous is (in a real
time SW system) and what a semaphore is for :-)

/M

On 2016-12-29 12:11, Alexander Griesser wrote:

> Yah, that's a PITA sometimes.
> Anyways, customers usually tend to ignore these facts hence why we're
> sometimes providing a small additional NFS volume to them with no client
> side caching for semaphores or other "realtime" like files, whereas the main
> application resides on an NFS share with caching being active.
>
> Best,
>
> Alexander Griesser
> Head of Systems Operations

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

AW: Atomicity of rename on NFS

Alexander Griesser-2
In reply to this post by Edward Rolison

Yah, that’s all the ammunition you need here.

Every operation performed on an NFS datastore is atomic, but only from a client point of view – Caching on the other clients is the problem you’re seeing now and there’s not really anything you can do about that except for either disabling the caching or rewriting the application logic to wait a few seconds in case of an ENOENT reply and trying again then.

 

Best,

 

Alexander Griesser

Head of Systems Operations

 

ANEXIA Internetdienstleistungs GmbH

 

E-Mail: [hidden email]

Web: http://www.anexia-it.com

 

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt

Geschäftsführer: Alexander Windbichler

Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

 

Von: Edward Rolison [mailto:[hidden email]]
Gesendet: Donnerstag, 29. Dezember 2016 12:24
An: Alexander Griesser <[hidden email]>
Cc: Michael Bergman <[hidden email]>; Toasters <[hidden email]>
Betreff: Re: Atomicity of rename on NFS

 

Agreed - NFS is problematic for all sorts of reasons when trying to use it as a parallel data store. It has worked to date, and the update to our NetApp infrastructure has acted as a catalyst.

If the answer is "we were relying on an assumption that wasn't true" then that would be acceptable, provided I can pull together some solid ammunition. 

Or indeed if 'rename is atomic to the client()' only applies to the client issuing the operation, and remote client behaviour is implementation specific (and undefined). 

 

 

On 29 December 2016 at 11:11, Alexander Griesser <[hidden email]> wrote:

Yah, that's a PITA sometimes.
Anyways, customers usually tend to ignore these facts hence why we're sometimes providing a small additional NFS volume to them with no client side caching for semaphores or other "realtime" like files, whereas the main application resides on an NFS share with caching being active.

Best,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: [hidden email]
Web: http://www.anexia-it.com

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
Geschäftsführer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601


-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Michael Bergman
Gesendet: Donnerstag, 29. Dezember 2016 12:07
An: Toasters <[hidden email]>
Betreff: Re: Atomicity of rename on NFS

Just a comment: race conditions over NFS can be common and severe if you construct a piece of SW (a system) which inherently assumes perfect [client] Cache Coherence.  This is not really possible to achieve, and client side caching is VERY important, yes crucial, for performance of any distributed file system like NFS is.

(Try Google for "perfect cache coherence" file system and look at the hits you will get.)

The .nfsXXXX files are residues from such a race conditions in the case that one client had a file open (an active File Handle) and writes to it, and another client just deletes that file. When the W data is flushed from the 1st client, the file is gone and that data goes into the .nfsXXXX file in the dir where client #2 expected the file to be.

It is, unfortunately, quite common that people have totally misunderstood the semantics of UNIX and NFS in this respect. Many really believe that if one NFS client has a file open, then no other client can delete it. So they have no idea what "Stale NFS file handle" means and how easy it is to end up in that situation if you work with a parallel system (home brew as it often
is) over NFS with many NFS clients involved. It seems easy and straightforward, but it is not.

There is no mandatory file locking in NFS. Never has been. It's advvisory and also before NFSv4.x auxiliary (the NLM system, with its own ports, it doesn't have very high performance capacity).

If you don't know EXACTLY what you're doing, you will shoot yourself in the foot.

Regards,
/M


On 2016-12-28 14:10, [hidden email] wrote:
> I would expect "Stale NFS handle" if the problem was (another) client
> caching. But it looks like (another) client actually contacts server
> and gets "No such file" in response. Multiple resources on Net suggest
> that it is known NFS limitation.
>
> I can think of at least one case when it is possible - if target file

> is currently opened on the same client that is doing rename, client is
> expected to rename target to .nfsXXXX to prevent deletion on server
> which opens up window when target file is not available.
>
> @Edward, do you see any .nfsXXXX files in the same directory?
>
> *From:*[hidden email]
> [mailto:[hidden email]]
> *On Behalf Of *Steiner, Jeffrey
> *Sent:* Wednesday, December 28, 2016 3:49 PM
> *To:* Edward Rolison
> *Cc:* [hidden email]
> *Subject:* Re: Atomicity of rename on NFS
>
> That sounds like normal behavior with the typical mount options used
> for NFS. What are you using exactly? The default includes several
> seconds of caching of file and directory data. The act of renaming a
> file is atomic but other NFS clients will not be immediately aware of
> the change unless you have actimeo=0 and noac in the mount options.
> There are performance consequences for that but sometimes it's
> unavoidable. For example, Oracle database clusters using NFS must
> always have a single consistent image of them data across notes. That's why they use actimeo=0 and noac.
>
> Sent from my mobile phone.
>
>
> On 28 Dec 2016, at 12:23, Edward Rolison <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> Hello fellow NetApp Admins.
> I have a bit of an odd one that I'm trying to troubleshoot - and
> whilst I'm not sure it's specifically filer related, it's NFS related
> (and is happening on a filer mount).
>
> What happens is this - there's a process that updates a file, and
> relies on 'rename()' being atomic- a journal is updated, and then
> reference pointer (file) is newly created, and renamed over an old one.
>
> The expectation is that this file will always be there - because
> "rename()" is defined as an atomic operation.
>
> But that's not quite what I'm getting - I have one nfs client doing
> it's
> (atomic) rename. And another client (different NFS host) reading it,
> and
> - occasionally - reporting 'no such file or directory'.
>
> This is causing an operation to fail, which in turn means that someone
> has to intervene in the process. This operation (and multiple
> extremely similar ones) happen at 5m intervals, and every few days
> (once a week
> maybe?) it fails for this reason, and our developers think that should
> be impossible. But as such - it looks like a pretty narrow race condition.
> [...]
_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

 


_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

AW: AW: Atomicity of rename on NFS

Alexander Griesser-2
In reply to this post by Michael Bergman
Yah, tell me about it - having this kind of discussions with every new customer on our platforms :)

Best,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: [hidden email]
Web: http://www.anexia-it.com 

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
Geschäftsführer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601


-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Michael Bergman
Gesendet: Donnerstag, 29. Dezember 2016 12:34
An: Toasters <[hidden email]>
Betreff: Re: AW: Atomicity of rename on NFS

What you describe below (about a special NFS mount) is not a bad idea IMO, but it still assumes the users understand what rendez-vous is (in a real time SW system) and what a semaphore is for :-)

/M

On 2016-12-29 12:11, Alexander Griesser wrote:

> Yah, that's a PITA sometimes.
> Anyways, customers usually tend to ignore these facts hence why we're
> sometimes providing a small additional NFS volume to them with no
> client side caching for semaphores or other "realtime" like files,
> whereas the main application resides on an NFS share with caching being active.
>
> Best,
>
> Alexander Griesser
> Head of Systems Operations

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters
Reply | Threaded
Open this post in threaded view
|

Re: Atomicity of rename on NFS

Edward Rolison
In reply to this post by Edward Rolison
As a followup on this - I've tracked down the problem, and wanted to say thanks to all the people offering insight - most of it moved me in the right direction, which I'm summarising here because it's at least a little interesting. 

It boils down to this - on the reading host, my pcap looks like:


79542  10.643148 10.0.0.52 -> 10.0.0.24 NFS 222  ACCESS allowed   testfile  V3 ACCESS Call, FH: 0x76a9a83d, [Check: RD MD XT XE]
79543  10.643286 10.0.0.24 -> 10.0.0.52 NFS 194 0 ACCESS allowed 0600 Regular File testfile NFS3_OK V3 ACCESS Reply (Call In 79542), [Allowed: RD MD XT XE]
79544  10.643335 10.0.0.52 -> 10.0.0.24 NFS 222  ACCESS allowed     V3 ACCESS Call, FH: 0xe0e7db45, [Check: RD LU MD XT DL]
79545  10.643456 10.0.0.24 -> 10.0.0.52 NFS 194 0 ACCESS allowed 0755 Directory  NFS3_OK V3 ACCESS Reply (Call In 79544), [Allowed: RD LU MD XT DL]
79546  10.643487 10.0.0.52 -> 10.0.0.24 NFS 230  LOOKUP    testfile  V3 LOOKUP Call, DH: 0xe0e7db45/testfile
79547  10.643632 10.0.0.24 -> 10.0.0.52 NFS 190 0 LOOKUP  0755 Directory  NFS3ERR_NOENT V3 LOOKUP Reply (Call In 79546) Error: NFS3ERR_NOENT
79548  10.643662 10.0.0.52 -> 10.0.0.24 NFS 230  LOOKUP    testfile  V3 LOOKUP Call, DH: 0xe0e7db45/testfile
79549  10.643814 10.0.0.24 -> 10.0.0.52 NFS 190 0 LOOKUP  0755 Directory  NFS3ERR_NOENT V3 LOOKUP Reply (Call In 79548) Error: NFS3ERR_NOENT

On my writing host - I get:
203306  13.805489  10.0.0.6 -> 10.0.0.24 NFS 246  LOOKUP    .nfs00000000d59701e500001030  V3 LOOKUP Call, DH: 0xe0e7db45/.nfs00000000d59701e500001030
203307  13.805687 10.0.0.24 -> 10.0.0.6  NFS 186 0 LOOKUP  0755 Directory  NFS3ERR_NOENT V3 LOOKUP Reply (Call In 203306) Error: NFS3ERR_NOENT
203308  13.805711  10.0.0.6 -> 10.0.0.24 NFS 306  RENAME    testfile,.nfs00000000d59701e500001030  V3 RENAME Call, From DH: 0xe0e7db45/testfile To DH: 0xe0e7db45/.nfs00000000d59701e500001030
203309  13.805982 10.0.0.24 -> 10.0.0.6  NFS 330 0,0 RENAME  0755,0755 Directory,Directory  NFS3_OK V3 RENAME Reply (Call In 203308)
203310  13.806008  10.0.0.6 -> 10.0.0.24 NFS 294  RENAME    testfile_temp,testfile  V3 RENAME Call, From DH: 0xe0e7db45/testfile_temp To DH: 0xe0e7db45/testfile
203311  13.806254 10.0.0.24 -> 10.0.0.6  NFS 330 0,0 RENAME  0755,0755 Directory,Directory  NFS3_OK V3 RENAME Reply (Call In 203310)
203312  13.806297  10.0.0.6 -> 10.0.0.24 NFS 246  CREATE    testfile_temp  V3 CREATE Call, DH: 0xe0e7db45/testfile_temp Mode: EXCLUSIVE
203313  13.806538 10.0.0.24 -> 10.0.0.6  NFS 354 0,0 CREATE  0755,0755 Regular File,Directory testfile_temp NFS3_OK V3 CREATE Reply (Call In 203312)
203314  13.806560  10.0.0.6 -> 10.0.0.24 NFS 246  SETATTR  0600  testfile_temp  V3 SETATTR Call, FH: 0x4b69a46a
203315  13.806767 10.0.0.24 -> 10.0.0.6  NFS 214 0 SETATTR  0600 Regular File testfile_temp NFS3_OK V3 SETATTR Reply (Call In 203314)


(IPs modified). 

The long and short of it is this - that _most_ of the time, everything works right, but when the file that's being overwritten (and deleted) has otherwise been opened for reading by another process - two RENAME operations occur, because NFS preserves the 'deleted' as part of it's stateless protocol thing - it has to be valid to open and unlink a file, and continue to be able to do IO to it, and that's how NFS solves the problem.

RENAME remains atomic from the client perspective, but you have a teeny tiny race condition between the two renames, during which the remote client might get NFS3ERR_NOENT (And return NOENT to the client) because there isn't a file present. 

The reason I had a hard time reproducing this is it simply doesn't happen in a simplisitic 'single writer' scenario - and doesn't happen often in our environment, because it _also_ requires the file to be open (for reading) at the same time. Addition of a pretty simple 'open file; sleep 1;' type loop was what caused it to occur more reliably/repeatably. 

Net result though, is that NFS doesn't offer any guarantees of atomicity of rename to remote clients - only to the client performing the rename operation. 





On 28 December 2016 at 11:21, Edward Rolison <[hidden email]> wrote:
Hello fellow NetApp Admins. 
I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount).

What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one. 

The expectation is that this file will always be there - because "rename()" is defined as an atomic operation. 

But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'. 

This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. 

So what I'm trying to figure out is first off:

- Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all)

- Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64  - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions). 

- Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think?

This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.





_______________________________________________
Toasters mailing list
[hidden email]
http://www.teaparty.net/mailman/listinfo/toasters