Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that Mailsender.send() returns a Deferred #3478

Open
Kevinsss opened this issue Oct 30, 2018 · 15 comments
Open

Document that Mailsender.send() returns a Deferred #3478

Kevinsss opened this issue Oct 30, 2018 · 15 comments

Comments

@Kevinsss
Copy link

Kevinsss commented Oct 30, 2018

Hi, I'm new to scrapy and I want to send some emails after the spider closed. But I got some errors, anyone know ? I'm using python2.7 and scrapy 1.5.1.
Here are my codes:

class AlertSpider(scrapy.Spider):
    name = "alert"
    start_urls = ['http://www.test.com']
    mails = []

    def parse(self, response):
        # Do something work

    @classmethod
    def from_crawler(cls, crawler):
        spider = cls()
        crawler.signals.connect(spider.spider_closed, signals.spider_closed)
        return spider

    def spider_closed(self, spider):
        settings = get_project_settings()
        mailer = MailSender.from_settings(settings)
       # first e-mail
        mailer.send(to=["xxxx@gmail.com"], subject='subject1', body='body1')
       # second e-mail
        return mailer.send(to=["xxxx@gmail.com"], subject='subject2', body='body2')

I want to send two e-mails after the spider close, but I get below errors:
(By the way, there is no problem if I just send one e-mail)

File "C:\Software\Python27\lib\site-packages\twisted\internet\selectreactor.py", line 149, in _doReadOrWrite why = getattr(selectable, method)() File "C:\Software\Python27\lib\site-packages\twisted\internet\tcp.py", line 243, in doRead return self._dataReceived(data) File "C:\Software\Python27\lib\site-packages\twisted\internet\tcp.py", line 249, in _dataReceived rval = self.protocol.dataReceived(data) File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 330, in dataReceived self._flushReceiveBIO() File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 300, in _flushReceiveBIO self._flushSendBIO() File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 252, in _flushSendBIO bytes = self._tlsConnection.bio_read(2 ** 15) exceptions.AttributeError: 'NoneType' object has no attribute 'bio_read'

It seems to the twisted doesn't close the io, but I don't find any close method in MailSender class,
so anyone have met this error?

@appleshowc
Copy link

I met the same error. However, I don't figure it out.
I sent one email, and it has been sent successfully.
But there is the same error info in my console.
I'm using python3.7 and scrapy 1.5.1.
Hope some one can fix it.

@111qqz
Copy link

111qqz commented Nov 9, 2018

Same problem here.
I try to send emails in the "close_spider" method of the pipeline class , because I have serveral spiders and I don't want to add the "sending email" code serveral times.
After I repalce "mailer.send(...)" with "return mailer.send(...)",the problem disappered

@111qqz
Copy link

111qqz commented Nov 9, 2018

by the way, I'm using python3.7 and scrapy 1.5.1

@XuCcc
Copy link

XuCcc commented Feb 21, 2019

I met the same problem when i try to send an email in the pipeline.It throw the error into logs but my email has been sent successfully.

The console outputs:

2019-02-21 21:32:58 [scrapy.mail] INFO: Mail sent OK: To=['xxxxxxxx@outlook.com'] Cc=[] Subject="test" Attachs=0
Unhandled Error
Traceback (most recent call last):
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
    why = selectable.doRead()
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/internet/tcp.py", line 243, in doRead
    return self._dataReceived(data)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/internet/tcp.py", line 249, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/protocols/tls.py", line 330, in dataReceived
    self._flushReceiveBIO()
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/protocols/tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
  File "/home/xu/.local/share/virtualenvs/converse-OK57Cjbh/lib/python3.6/site-packages/twisted/protocols/tls.py", line 252, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'

Python and scrapy version:

(converse)  xu@xu-ThundeRobot  ~/Projects/temp/spider/converse  python -V
Python 3.6.7
(converse)  xu@xu-ThundeRobot  ~/Projects/temp/spider/converse  pipenv graph
Pillow==5.4.1
Scrapy==1.6.0
  - cssselect [required: >=0.9, installed: 1.0.3]
  - lxml [required: Any, installed: 4.3.1]
  - parsel [required: >=1.5, installed: 1.5.1]
    - cssselect [required: >=0.9, installed: 1.0.3]
    - lxml [required: >=2.3, installed: 4.3.1]
    - six [required: >=1.5.2, installed: 1.12.0]
    - w3lib [required: >=1.19.0, installed: 1.20.0]
      - six [required: >=1.4.1, installed: 1.12.0]
  - PyDispatcher [required: >=2.0.5, installed: 2.0.5]
  - pyOpenSSL [required: Any, installed: 19.0.0]
    - cryptography [required: >=2.3, installed: 2.5]
      - asn1crypto [required: >=0.21.0, installed: 0.24.0]
      - cffi [required: >=1.8,!=1.11.3, installed: 1.12.1]
        - pycparser [required: Any, installed: 2.19]
      - six [required: >=1.4.1, installed: 1.12.0]
    - six [required: >=1.5.2, installed: 1.12.0]
  - queuelib [required: Any, installed: 1.5.0]
  - service-identity [required: Any, installed: 18.1.0]
    - attrs [required: >=16.0.0, installed: 18.2.0]
    - cryptography [required: Any, installed: 2.5]
      - asn1crypto [required: >=0.21.0, installed: 0.24.0]
      - cffi [required: >=1.8,!=1.11.3, installed: 1.12.1]
        - pycparser [required: Any, installed: 2.19]
      - six [required: >=1.4.1, installed: 1.12.0]
    - pyasn1 [required: Any, installed: 0.4.5]
    - pyasn1-modules [required: Any, installed: 0.2.4]
      - pyasn1 [required: >=0.4.1,<0.5.0, installed: 0.4.5]
  - six [required: >=1.5.2, installed: 1.12.0]
  - Twisted [required: >=13.1.0, installed: 18.9.0]
    - attrs [required: >=17.4.0, installed: 18.2.0]
    - Automat [required: >=0.3.0, installed: 0.7.0]
      - attrs [required: >=16.1.0, installed: 18.2.0]
      - six [required: Any, installed: 1.12.0]
    - constantly [required: >=15.1, installed: 15.1.0]
    - hyperlink [required: >=17.1.1, installed: 18.0.0]
      - idna [required: >=2.5, installed: 2.8]
    - incremental [required: >=16.10.1, installed: 17.5.0]
    - PyHamcrest [required: >=1.9.0, installed: 1.9.0]
      - setuptools [required: Any, installed: 40.8.0]
      - six [required: Any, installed: 1.12.0]
    - zope.interface [required: >=4.4.2, installed: 4.6.0]
      - setuptools [required: Any, installed: 40.8.0]
  - w3lib [required: >=1.17.0, installed: 1.20.0]
    - six [required: >=1.4.1, installed: 1.12.0]

And my code:

class SendEmailPipeLine(object):
    def __init__(self, settings):
        self.mailer = MailSender.from_settings(settings)
        self.pools = []

    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        return cls(settings)

    def process_item(self, item, spider):
        self.pools.append(item)
        return item

    def close_spider(self, spider):
        self.mailer.send('xxxxxxxxxx@outlook.com','test','asdfghjkjbvcxzqwertyuiop')

Email settings:

MAIL_FROM = 'xxxxxxxxxx@outlook.com'
MAIL_HOST = 'smtp.office365.com'
MAIL_PORT = 587
MAIL_USER = 'xxxxxxxxxxxxx@outlook.com'
MAIL_PASS = 'xxxxxxxxxxxxxx'
MAIL_TLS = True

@GloriaXie123
Copy link

GloriaXie123 commented Mar 3, 2019

i have the same issue when i use Scrapy send emaill , the email has been sent successfully.
i am using python3.6 and scrapy 1.6
here is the error StackTrace:

  • [Traceback(most recent call last):
    File "D:\graduate\venv\lib\site-packages\twisted\python\log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
    File "D:\graduate\venv\lib\site-packages\twisted\python\log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
    File "D:\graduate\venv\lib\site-packages\twisted\python\context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
    File "D:\graduate\venv\lib\site-packages\twisted\python\context.py", line 85, in callWithContext
    return func(*args,**kw)
    --- ---
    File "D:\graduate\venv\lib\site-packages\twisted\internet\selectreactor.py", line 149, in _doReadOrWrite
    why = getattr(selectable, method)()
    File "D:\graduate\venv\lib\site-packages\twisted\internet\tcp.py", line 243, in doRead
    return self._dataReceived(data)
    File "D:\graduate\venv\lib\site-packages\twisted\internet\tcp.py", line 249, in _dataReceived
    rval = self.protocol.dataReceived(data)
    File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 330, in dataReceived
    self._flushReceiveBIO()
    File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
    File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 252, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
    builtins.AttributeError: 'NoneType' object has no attribute 'bio_read' ]

@easeflyer
Copy link

i have the same issue when i use Scrapy send emaill , the email has been sent successfully.

@13240137000
Copy link

same problem here. when i try to send email via email module got same error, i used python 3.6.2 and scrapy 1.6.0

@Ksianka
Copy link

Ksianka commented Jun 2, 2019

test_spider.py

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
    ]
    mails = []

    def __init__(self, *args, **kwargs):
        super(QuotesSpider, self).__init__(*args, **kwargs)

    def parse(self, response):
        pass

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(QuotesSpider, cls).from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
        return spider

    def spider_closed(self):
        settings = get_project_settings()
        mailer = MailSender.from_settings(settings)
        mailer.send(to=[“XX@gmail.com"], subject='subject2', body='body2')

Hello,
it looks like the problem lies in use of Twisted Deferred class in Scrapy.

MailSender.send() returns Twisted deferred object (see line 106 in module scrapy.mail) with callbacks _sent_ok and _sent_failed for success and failure accordingly. (Line 102 in scrapy.mail).

Use of MailerSend.send() in spider_closed generates logs where the spider is closed and then mail is sent - looks like expected behaviour.

2019-06-02 19:54:08 [scrapy.core.engine] INFO: Spider closed (finished)
2019-06-02 19:54:10 [scrapy.mail] INFO: Mail sent OK: To=[‘XXXX@gmail.com'] Cc=[] Subject="subject2" Attachs=0

However, you get the error in traceback:
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'
bytes = self._tlsConnection.bio_read(2 ** 15)

My explanation of the error:
As far as I understand the end of Scrapy crawler work triggers Twisted reactor/main loop shutdown and disconnectAll() when callback _sent_ok or _sent_failed has not been executed.
The callback tries to communicate through lost TLS connection.

The error itself is the result of TLSMemoryBIOProtocol.connectionLost() triggered by end of crawler work where attribute _tlsConnection is assigned None (see line 407 twisted.protocols.tls).
This line self._tlsConnection = None was added to Twisted in March of 2018 (see pull request for reference twisted/twisted#955).
Without this pull request no error present in the same way of MailSender.send() execution.

As a workaround and based on my very little knowledge of Twisted Deferred class and Scrapy I can propose the following:
One way to guarantee that Twisted reactor/main loop is not shut down before MailSender.send() has finished with its callbacks is to return the resulting Deferred instance. See example:

def spider_closed(self):
settings = get_project_settings()
mailer = MailSender.from_settings(settings)
return mailer.send(to=[“XXXX@gmail.com"], subject='subject2', body='body2')

In this case reactor/main loop shutdown process will wait.

You can see it from logs:

2019-06-02 20:00:20 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-02 20:00:22 [scrapy.mail] INFO: Mail sent OK: To=['XXXX@gmail.com'] Cc=[] Subject="subject2" Attachs=0
2019-06-02 20:00:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
2019-06-02 20:00:22 [scrapy.core.engine] INFO: Spider closed (finished)

My question to Scrapy owners, @Gallaecio, can we consider the workaround as a fix and change documentation for MailSender.send() ?
Or can someone continue digging into Twisted world and propose some more valuable adjustments for using Deferred in Scrapy?

@beausoft
Copy link

i have the same issue when i use Scrapy send emaill , the email has been sent successfully.

@Bone117
Copy link

Bone117 commented Jul 24, 2019

self._send_mail(body,subject).addCallback(lambda x: x)

@CarterPape
Copy link

I can verify that this is still an issue.

The email goes through, but a fatal error gets thrown with the following traceback:

[twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/selectreactor.py", line 149, in _doReadOrWrite
    why = getattr(selectable, method)()
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/tcp.py", line 243, in doRead
    return self._dataReceived(data)
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/tcp.py", line 249, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/usr/local/lib/python3.7/site-packages/twisted/protocols/tls.py", line 330, in dataReceived
    self._flushReceiveBIO()
  File "/usr/local/lib/python3.7/site-packages/twisted/protocols/tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
  File "/usr/local/lib/python3.7/site-packages/twisted/protocols/tls.py", line 252, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'

@iveney
Copy link

iveney commented Nov 22, 2020

I have an email pipeline that sends email during process_item and have the error 'NoneType' object has no attribute 'bio_read', e.g., something like:

def process_item(self, item, spider):
  if meets_criteria:
    mailer.send(...)
  return item

Changing the function to async and use await seems to solve it for me, as mailer.send returns the deferred object.

async def process_item(self, item, spider):
  if meets_criteria:
    await mailer.send(...)
  return item

Not sure if this is the right way to solve but it seems working for me.

@brickyang
Copy link

Same issue here. @iveney 's solution works.

@LidaGuo1999
Copy link

Same issue here. @iveney's solution really works for me.

@wRAR wRAR changed the title Scrapy mail.send error Document that Mailsender.send() returns a Deferred Jan 29, 2023
@wRAR
Copy link
Member

wRAR commented Jan 29, 2023

My question to Scrapy owners, @Gallaecio, can we consider the workaround as a fix and change documentation for MailSender.send() ?

It's not a workaround but the correct usage of this function, or other functions that return a Deferred instead of waiting until the action is done. It indeed makes sense to mention in the docs that you are supposed to wait for the deferred instead of just calling this function and assuming its synchronous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests