为什么 IO 多路复用要搭配非阻塞 IO?

Question

为什么 IO 多路复用要搭配非阻塞 IO?

假如我调用了一个 select 函数，并且关注了几个描述字， select 函数就会一直阻塞直到我关注的事件发生. 假如当有套接口可读时， selec…

关注者

991

被浏览

182,687

29 个回答

看 Tornado 源码时候，我也产生了同样的疑问。

首先基于 Reactor 模型，socket.fd 已经被放到 ioloop（事件循环）中，通过多路复用监听到 fd 已可用，并开始调用对应的事件处理函数（handle_read/hanle_accept）。

def handle_read():
    socket.setblocking(False)
    while True:
        try:
            data = socket.recv(1024)
        except socket.error, e:
            if e.args[0] in (errno.EWOULDBLOCK, errno.EAGAIN):
                return
            raise

def handle_accept():
    socket.setblocking(False)
    while True:
        try:
            connection, address = socket.accept()
        except socket.error, e:
            if e.args[0] in (errno.EWOULDBLOCK, errno.EAGAIN):
                return
            raise

假如 socket 的读缓冲区中已有足够多的数据，需要调用三次 read 才能读取完。或 ACCEPT 队列已经有三个「握手已完成的连接」。

非阻塞 I/O 的处理方式：循环的 read 或 accept，直到读完所有的数据（抛出 EWOULDBLOCK 异常）。

阻塞 I/O 的处理方式：每次只能调用一次 read 或 accept，因为多路复用只会告诉你 fd 对应的 socket 可读了，但不会告诉你有多少的数据可读，所以在 handle_read/handle_accept 中只能 read/accept 一次，你无法知道下一次 read/accept 会不会发生阻塞。所以只能等 ioloop 的第二次循环，ioloop 告诉你 fd 可用后再继续调用 handle_read/handle_accept 处理，然后再循环第三次。

所以你会发现，后者的处理方式要复杂很多，稍不注意就会阻塞整个进程。

======Update======

另外在下面这几种情况，必须采用非阻塞 I/O：

在边缘触发（Edge Trigger）环境，参见为什么 IO 多路复用要搭配非阻塞 IO? - 郭春阳的回答
在多线程环境，参见为什么 IO 多路复用要搭配非阻塞 IO? - 林晓峰的回答
在触发 select bug 时候，参见在使用Multiplexed I/O的情况下，还有必要使用Non Blocking I/O么 ? - 知乎用户的回答

编辑于 2016-10-21 13:59

依云 生亦非吾愿，死亦非吾愿。 · Accepted Answer

man 2 select 「BUGS」节：

Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when

data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as

ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

就算数据不会被别人读走，也可能被内核丢弃。还有文档没有明说的其它情况。

编辑于 2016-01-16 11:15