Skip to content

Commit 3b2018f

Browse files
torvaldsgregkh
authored andcommitted
pipe: do FASYNC notifications for every pipe IO, not just state changes
commit fe67f4d upstream. It turns out that the SIGIO/FASYNC situation is almost exactly the same as the EPOLLET case was: user space really wants to be notified after every operation. Now, in a perfect world it should be sufficient to only notify user space on "state transitions" when the IO state changes (ie when a pipe goes from unreadable to readable, or from unwritable to writable). User space should then do as much as possible - fully emptying the buffer or what not - and we'll notify it again the next time the state changes. But as with EPOLLET, we have at least one case (stress-ng) where the kernel sent SIGIO due to the pipe being marked for asynchronous notification, but the user space signal handler then didn't actually necessarily read it all before returning (it read more than what was written, but since there could be multiple writes, it could leave data pending). The user space code then expected to get another SIGIO for subsequent writes - even though the pipe had been readable the whole time - and would only then read more. This is arguably a user space bug - and Colin King already fixed the stress-ng code in question - but the kernel regression rules are clear: it doesn't matter if kernel people think that user space did something silly and wrong. What matters is that it used to work. So if user space depends on specific historical kernel behavior, it's a regression when that behavior changes. It's on us: we were silly to have that non-optimal historical behavior, and our old kernel behavior was what user space was tested against. Because of how the FASYNC notification was tied to wakeup behavior, this was first broken by commits f467a6a and 1b6b26a ("pipe: fix and clarify pipe read/write wakeup logic"), but at the time it seems nobody noticed. Probably because the stress-ng problem case ends up being timing-dependent too. It was then unwittingly fixed by commit 3a34b13 ("pipe: make pipe writes always wake up readers") only to be broken again when by commit 3b84482 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads"). And at that point the kernel test robot noticed the performance refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag below is somewhat ad hoc, but it matches when the issue was noticed. Fix it for good (knock wood) by simply making the kill_fasync() case separate from the wakeup case. FASYNC is quite rare, and we clearly shouldn't even try to use the "avoid unnecessary wakeups" logic for it. Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/ Fixes: 3b84482 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent e91da23 commit 3b2018f

1 file changed

Lines changed: 8 additions & 12 deletions

File tree

fs/pipe.c

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -363,10 +363,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
363363
* _very_ unlikely case that the pipe was full, but we got
364364
* no data.
365365
*/
366-
if (unlikely(was_full)) {
366+
if (unlikely(was_full))
367367
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
368-
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
369-
}
368+
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
370369

371370
/*
372371
* But because we didn't read anything, at this point we can
@@ -385,12 +384,11 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
385384
wake_next_reader = false;
386385
__pipe_unlock(pipe);
387386

388-
if (was_full) {
387+
if (was_full)
389388
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
390-
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
391-
}
392389
if (wake_next_reader)
393390
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
391+
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
394392
if (ret > 0)
395393
file_accessed(filp);
396394
return ret;
@@ -565,10 +563,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
565563
* become empty while we dropped the lock.
566564
*/
567565
__pipe_unlock(pipe);
568-
if (was_empty) {
566+
if (was_empty)
569567
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
570-
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
571-
}
568+
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
572569
wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
573570
__pipe_lock(pipe);
574571
was_empty = pipe_empty(pipe->head, pipe->tail);
@@ -591,10 +588,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
591588
* Epoll nonsensically wants a wakeup whether the pipe
592589
* was already empty or not.
593590
*/
594-
if (was_empty || pipe->poll_usage) {
591+
if (was_empty || pipe->poll_usage)
595592
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
596-
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
597-
}
593+
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
598594
if (wake_next_writer)
599595
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
600596
if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) {

0 commit comments

Comments
 (0)