Skip to content

Document that Mailsender.send() returns a Deferred #3478

Open
@Kevinsss

Description

@Kevinsss

Hi, I'm new to scrapy and I want to send some emails after the spider closed. But I got some errors, anyone know ? I'm using python2.7 and scrapy 1.5.1.
Here are my codes:

class AlertSpider(scrapy.Spider):
    name = "alert"
    start_urls = ['http://www.test.com']
    mails = []

    def parse(self, response):
        # Do something work

    @classmethod
    def from_crawler(cls, crawler):
        spider = cls()
        crawler.signals.connect(spider.spider_closed, signals.spider_closed)
        return spider

    def spider_closed(self, spider):
        settings = get_project_settings()
        mailer = MailSender.from_settings(settings)
       # first e-mail
        mailer.send(to=["xxxx@gmail.com"], subject='subject1', body='body1')
       # second e-mail
        return mailer.send(to=["xxxx@gmail.com"], subject='subject2', body='body2')

I want to send two e-mails after the spider close, but I get below errors:
(By the way, there is no problem if I just send one e-mail)

File "C:\Software\Python27\lib\site-packages\twisted\internet\selectreactor.py", line 149, in _doReadOrWrite why = getattr(selectable, method)() File "C:\Software\Python27\lib\site-packages\twisted\internet\tcp.py", line 243, in doRead return self._dataReceived(data) File "C:\Software\Python27\lib\site-packages\twisted\internet\tcp.py", line 249, in _dataReceived rval = self.protocol.dataReceived(data) File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 330, in dataReceived self._flushReceiveBIO() File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 300, in _flushReceiveBIO self._flushSendBIO() File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 252, in _flushSendBIO bytes = self._tlsConnection.bio_read(2 ** 15) exceptions.AttributeError: 'NoneType' object has no attribute 'bio_read'

It seems to the twisted doesn't close the io, but I don't find any close method in MailSender class,
so anyone have met this error?

Activity

appleshowc

appleshowc commented on Nov 5, 2018

@appleshowc

I met the same error. However, I don't figure it out.
I sent one email, and it has been sent successfully.
But there is the same error info in my console.
I'm using python3.7 and scrapy 1.5.1.
Hope some one can fix it.

111qqz

111qqz commented on Nov 9, 2018

@111qqz

Same problem here.
I try to send emails in the "close_spider" method of the pipeline class , because I have serveral spiders and I don't want to add the "sending email" code serveral times.
After I repalce "mailer.send(...)" with "return mailer.send(...)",the problem disappered

111qqz

111qqz commented on Nov 9, 2018

@111qqz

by the way, I'm using python3.7 and scrapy 1.5.1

XuCcc

XuCcc commented on Feb 21, 2019

@XuCcc

I met the same problem when i try to send an email in the pipeline.It throw the error into logs but my email has been sent successfully.

The console outputs:

2019-02-21 21:32:58 [scrapy.mail] INFO: Mail sent OK: To=['xxxxxxxx@outlook.com'] Cc=[] Subject="test" Attachs=0
Unhandled Error
Traceback (most recent call last):
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
    why = selectable.doRead()
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/internet/tcp.py", line 243, in doRead
    return self._dataReceived(data)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/internet/tcp.py", line 249, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/protocols/tls.py", line 330, in dataReceived
    self._flushReceiveBIO()
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/protocols/tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/protocols/tls.py", line 252, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'

Python and scrapy version:

(converse)  xu@xu-ThundeRobot  ~/Projects/temp/spider/converse  python -V
Python 3.6.7
(converse)  xu@xu-ThundeRobot  ~/Projects/temp/spider/converse  pipenv graph
Pillow==5.4.1
Scrapy==1.6.0
  - cssselect [required: >=0.9, installed: 1.0.3]
  - lxml [required: Any, installed: 4.3.1]
  - parsel [required: >=1.5, installed: 1.5.1]
    - cssselect [required: >=0.9, installed: 1.0.3]
    - lxml [required: >=2.3, installed: 4.3.1]
    - six [required: >=1.5.2, installed: 1.12.0]
    - w3lib [required: >=1.19.0, installed: 1.20.0]
      - six [required: >=1.4.1, installed: 1.12.0]
  - PyDispatcher [required: >=2.0.5, installed: 2.0.5]
  - pyOpenSSL [required: Any, installed: 19.0.0]
    - cryptography [required: >=2.3, installed: 2.5]
      - asn1crypto [required: >=0.21.0, installed: 0.24.0]
      - cffi [required: >=1.8,!=1.11.3, installed: 1.12.1]
        - pycparser [required: Any, installed: 2.19]
      - six [required: >=1.4.1, installed: 1.12.0]
    - six [required: >=1.5.2, installed: 1.12.0]
  - queuelib [required: Any, installed: 1.5.0]
  - service-identity [required: Any, installed: 18.1.0]
    - attrs [required: >=16.0.0, installed: 18.2.0]
    - cryptography [required: Any, installed: 2.5]
      - asn1crypto [required: >=0.21.0, installed: 0.24.0]
      - cffi [required: >=1.8,!=1.11.3, installed: 1.12.1]
        - pycparser [required: Any, installed: 2.19]
      - six [required: >=1.4.1, installed: 1.12.0]
    - pyasn1 [required: Any, installed: 0.4.5]
    - pyasn1-modules [required: Any, installed: 0.2.4]
      - pyasn1 [required: >=0.4.1,<0.5.0, installed: 0.4.5]
  - six [required: >=1.5.2, installed: 1.12.0]
  - Twisted [required: >=13.1.0, installed: 18.9.0]
    - attrs [required: >=17.4.0, installed: 18.2.0]
    - Automat [required: >=0.3.0, installed: 0.7.0]
      - attrs [required: >=16.1.0, installed: 18.2.0]
      - six [required: Any, installed: 1.12.0]
    - constantly [required: >=15.1, installed: 15.1.0]
    - hyperlink [required: >=17.1.1, installed: 18.0.0]
      - idna [required: >=2.5, installed: 2.8]
    - incremental [required: >=16.10.1, installed: 17.5.0]
    - PyHamcrest [required: >=1.9.0, installed: 1.9.0]
      - setuptools [required: Any, installed: 40.8.0]
      - six [required: Any, installed: 1.12.0]
    - zope.interface [required: >=4.4.2, installed: 4.6.0]
      - setuptools [required: Any, installed: 40.8.0]
  - w3lib [required: >=1.17.0, installed: 1.20.0]
    - six [required: >=1.4.1, installed: 1.12.0]

And my code:

class SendEmailPipeLine(object):
    def __init__(self, settings):
        self.mailer = MailSender.from_settings(settings)
        self.pools = []

    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        return cls(settings)

    def process_item(self, item, spider):
        self.pools.append(item)
        return item

    def close_spider(self, spider):
        self.mailer.send('xxxxxxxxxx@outlook.com','test','asdfghjkjbvcxzqwertyuiop')

Email settings:

MAIL_FROM = 'xxxxxxxxxx@outlook.com'
MAIL_HOST = 'smtp.office365.com'
MAIL_PORT = 587
MAIL_USER = 'xxxxxxxxxxxxx@outlook.com'
MAIL_PASS = 'xxxxxxxxxxxxxx'
MAIL_TLS = True
GloriaXie123

GloriaXie123 commented on Mar 3, 2019

@GloriaXie123

i have the same issue when i use Scrapy send emaill , the email has been sent successfully.
i am using python3.6 and scrapy 1.6
here is the error StackTrace:

  • [Traceback(most recent call last):
    File "D:\graduate\venv\lib\site-packages\twisted\python\log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
    File "D:\graduate\venv\lib\site-packages\twisted\python\log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
    File "D:\graduate\venv\lib\site-packages\twisted\python\context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
    File "D:\graduate\venv\lib\site-packages\twisted\python\context.py", line 85, in callWithContext
    return func(*args,**kw)
    --- ---
    File "D:\graduate\venv\lib\site-packages\twisted\internet\selectreactor.py", line 149, in _doReadOrWrite
    why = getattr(selectable, method)()
    File "D:\graduate\venv\lib\site-packages\twisted\internet\tcp.py", line 243, in doRead
    return self._dataReceived(data)
    File "D:\graduate\venv\lib\site-packages\twisted\internet\tcp.py", line 249, in _dataReceived
    rval = self.protocol.dataReceived(data)
    File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 330, in dataReceived
    self._flushReceiveBIO()
    File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
    File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 252, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
    builtins.AttributeError: 'NoneType' object has no attribute 'bio_read' ]
easeflyer

easeflyer commented on May 4, 2019

@easeflyer

i have the same issue when i use Scrapy send emaill , the email has been sent successfully.

13240137000

13240137000 commented on May 5, 2019

@13240137000

same problem here. when i try to send email via email module got same error, i used python 3.6.2 and scrapy 1.6.0

Ksianka

Ksianka commented on Jun 2, 2019

@Ksianka

test_spider.py

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
    ]
    mails = []

    def __init__(self, *args, **kwargs):
        super(QuotesSpider, self).__init__(*args, **kwargs)

    def parse(self, response):
        pass

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(QuotesSpider, cls).from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
        return spider

    def spider_closed(self):
        settings = get_project_settings()
        mailer = MailSender.from_settings(settings)
        mailer.send(to=[“XX@gmail.com"], subject='subject2', body='body2')

Hello,
it looks like the problem lies in use of Twisted Deferred class in Scrapy.

MailSender.send() returns Twisted deferred object (see line 106 in module scrapy.mail) with callbacks _sent_ok and _sent_failed for success and failure accordingly. (Line 102 in scrapy.mail).

Use of MailerSend.send() in spider_closed generates logs where the spider is closed and then mail is sent - looks like expected behaviour.

2019-06-02 19:54:08 [scrapy.core.engine] INFO: Spider closed (finished)
2019-06-02 19:54:10 [scrapy.mail] INFO: Mail sent OK: To=[‘XXXX@gmail.com'] Cc=[] Subject="subject2" Attachs=0

However, you get the error in traceback:
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'
bytes = self._tlsConnection.bio_read(2 ** 15)

My explanation of the error:
As far as I understand the end of Scrapy crawler work triggers Twisted reactor/main loop shutdown and disconnectAll() when callback _sent_ok or _sent_failed has not been executed.
The callback tries to communicate through lost TLS connection.

The error itself is the result of TLSMemoryBIOProtocol.connectionLost() triggered by end of crawler work where attribute _tlsConnection is assigned None (see line 407 twisted.protocols.tls).
This line self._tlsConnection = None was added to Twisted in March of 2018 (see pull request for reference twisted/twisted#955).
Without this pull request no error present in the same way of MailSender.send() execution.

As a workaround and based on my very little knowledge of Twisted Deferred class and Scrapy I can propose the following:
One way to guarantee that Twisted reactor/main loop is not shut down before MailSender.send() has finished with its callbacks is to return the resulting Deferred instance. See example:

def spider_closed(self):
settings = get_project_settings()
mailer = MailSender.from_settings(settings)
return mailer.send(to=[“XXXX@gmail.com"], subject='subject2', body='body2')

In this case reactor/main loop shutdown process will wait.

You can see it from logs:

2019-06-02 20:00:20 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-02 20:00:22 [scrapy.mail] INFO: Mail sent OK: To=['XXXX@gmail.com'] Cc=[] Subject="subject2" Attachs=0
2019-06-02 20:00:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
2019-06-02 20:00:22 [scrapy.core.engine] INFO: Spider closed (finished)

My question to Scrapy owners, @Gallaecio, can we consider the workaround as a fix and change documentation for MailSender.send() ?
Or can someone continue digging into Twisted world and propose some more valuable adjustments for using Deferred in Scrapy?

beausoft

beausoft commented on Jun 11, 2019

@beausoft

i have the same issue when i use Scrapy send emaill , the email has been sent successfully.

Bone117

Bone117 commented on Jul 24, 2019

@Bone117

self._send_mail(body,subject).addCallback(lambda x: x)

CarterPape

CarterPape commented on Feb 3, 2020

@CarterPape

I can verify that this is still an issue.

The email goes through, but a fatal error gets thrown with the following traceback:

[twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/selectreactor.py", line 149, in _doReadOrWrite
    why = getattr(selectable, method)()
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/tcp.py", line 243, in doRead
    return self._dataReceived(data)
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/tcp.py", line 249, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/usr/local/lib/python3.7/site-packages/twisted/protocols/tls.py", line 330, in dataReceived
    self._flushReceiveBIO()
  File "/usr/local/lib/python3.7/site-packages/twisted/protocols/tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
  File "/usr/local/lib/python3.7/site-packages/twisted/protocols/tls.py", line 252, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'
iveney

iveney commented on Nov 22, 2020

@iveney

I have an email pipeline that sends email during process_item and have the error 'NoneType' object has no attribute 'bio_read', e.g., something like:

def process_item(self, item, spider):
  if meets_criteria:
    mailer.send(...)
  return item

Changing the function to async and use await seems to solve it for me, as mailer.send returns the deferred object.

async def process_item(self, item, spider):
  if meets_criteria:
    await mailer.send(...)
  return item

Not sure if this is the right way to solve but it seems working for me.

brickyang

brickyang commented on May 20, 2021

@brickyang

Same issue here. @iveney 's solution works.

12 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @iveney@wRAR@Gallaecio@easeflyer@brickyang

        Issue actions

          Document that Mailsender.send() returns a Deferred · Issue #3478 · scrapy/scrapy