Description
Hi, I'm new to scrapy and I want to send some emails after the spider closed. But I got some errors, anyone know ? I'm using python2.7 and scrapy 1.5.1.
Here are my codes:
class AlertSpider(scrapy.Spider):
name = "alert"
start_urls = ['http://www.test.com']
mails = []
def parse(self, response):
# Do something work
@classmethod
def from_crawler(cls, crawler):
spider = cls()
crawler.signals.connect(spider.spider_closed, signals.spider_closed)
return spider
def spider_closed(self, spider):
settings = get_project_settings()
mailer = MailSender.from_settings(settings)
# first e-mail
mailer.send(to=["xxxx@gmail.com"], subject='subject1', body='body1')
# second e-mail
return mailer.send(to=["xxxx@gmail.com"], subject='subject2', body='body2')
I want to send two e-mails after the spider close, but I get below errors:
(By the way, there is no problem if I just send one e-mail)
File "C:\Software\Python27\lib\site-packages\twisted\internet\selectreactor.py", line 149, in _doReadOrWrite why = getattr(selectable, method)() File "C:\Software\Python27\lib\site-packages\twisted\internet\tcp.py", line 243, in doRead return self._dataReceived(data) File "C:\Software\Python27\lib\site-packages\twisted\internet\tcp.py", line 249, in _dataReceived rval = self.protocol.dataReceived(data) File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 330, in dataReceived self._flushReceiveBIO() File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 300, in _flushReceiveBIO self._flushSendBIO() File "C:\Software\Python27\lib\site-packages\twisted\protocols\tls.py", line 252, in _flushSendBIO bytes = self._tlsConnection.bio_read(2 ** 15) exceptions.AttributeError: 'NoneType' object has no attribute 'bio_read'
It seems to the twisted
doesn't close the io, but I don't find any close
method in MailSender
class,
so anyone have met this error?
Activity
appleshowc commentedon Nov 5, 2018
I met the same error. However, I don't figure it out.
I sent one email, and it has been sent successfully.
But there is the same error info in my console.
I'm using python3.7 and scrapy 1.5.1.
Hope some one can fix it.
111qqz commentedon Nov 9, 2018
Same problem here.
I try to send emails in the "close_spider" method of the pipeline class , because I have serveral spiders and I don't want to add the "sending email" code serveral times.
After I repalce "mailer.send(...)" with "return mailer.send(...)",the problem disappered
111qqz commentedon Nov 9, 2018
by the way, I'm using python3.7 and scrapy 1.5.1
XuCcc commentedon Feb 21, 2019
I met the same problem when i try to send an email in the pipeline.It throw the error into logs but my email has been sent successfully.
The console outputs:
Python and scrapy version:
And my code:
Email settings:
GloriaXie123 commentedon Mar 3, 2019
i have the same issue when i use Scrapy send emaill , the email has been sent successfully.
i am using python3.6 and scrapy 1.6
here is the error StackTrace:
File "D:\graduate\venv\lib\site-packages\twisted\python\log.py", line 103, in callWithLogger
return callWithContext({"system": lp}, func, *args, **kw)
File "D:\graduate\venv\lib\site-packages\twisted\python\log.py", line 86, in callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File "D:\graduate\venv\lib\site-packages\twisted\python\context.py", line 122, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "D:\graduate\venv\lib\site-packages\twisted\python\context.py", line 85, in callWithContext
return func(*args,**kw)
--- ---
File "D:\graduate\venv\lib\site-packages\twisted\internet\selectreactor.py", line 149, in _doReadOrWrite
why = getattr(selectable, method)()
File "D:\graduate\venv\lib\site-packages\twisted\internet\tcp.py", line 243, in doRead
return self._dataReceived(data)
File "D:\graduate\venv\lib\site-packages\twisted\internet\tcp.py", line 249, in _dataReceived
rval = self.protocol.dataReceived(data)
File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 330, in dataReceived
self._flushReceiveBIO()
File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 300, in _flushReceiveBIO
self._flushSendBIO()
File "D:\graduate\venv\lib\site-packages\twisted\protocols\tls.py", line 252, in _flushSendBIO
bytes = self._tlsConnection.bio_read(2 ** 15)
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read' ]
easeflyer commentedon May 4, 2019
i have the same issue when i use Scrapy send emaill , the email has been sent successfully.
13240137000 commentedon May 5, 2019
same problem here. when i try to send email via email module got same error, i used python 3.6.2 and scrapy 1.6.0
Ksianka commentedon Jun 2, 2019
test_spider.py
Hello,
it looks like the problem lies in use of Twisted Deferred class in Scrapy.
MailSender.send() returns Twisted deferred object (see line 106 in module scrapy.mail) with callbacks _sent_ok and _sent_failed for success and failure accordingly. (Line 102 in scrapy.mail).
Use of MailerSend.send() in spider_closed generates logs where the spider is closed and then mail is sent - looks like expected behaviour.
2019-06-02 19:54:08 [scrapy.core.engine] INFO: Spider closed (finished)
2019-06-02 19:54:10 [scrapy.mail] INFO: Mail sent OK: To=[‘XXXX@gmail.com'] Cc=[] Subject="subject2" Attachs=0
However, you get the error in traceback:
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'
bytes = self._tlsConnection.bio_read(2 ** 15)
My explanation of the error:
As far as I understand the end of Scrapy crawler work triggers Twisted reactor/main loop shutdown and disconnectAll() when callback _sent_ok or _sent_failed has not been executed.
The callback tries to communicate through lost TLS connection.
The error itself is the result of TLSMemoryBIOProtocol.connectionLost() triggered by end of crawler work where attribute _tlsConnection is assigned None (see line 407 twisted.protocols.tls).
This line self._tlsConnection = None was added to Twisted in March of 2018 (see pull request for reference twisted/twisted#955).
Without this pull request no error present in the same way of MailSender.send() execution.
As a workaround and based on my very little knowledge of Twisted Deferred class and Scrapy I can propose the following:
One way to guarantee that Twisted reactor/main loop is not shut down before MailSender.send() has finished with its callbacks is to return the resulting Deferred instance. See example:
def spider_closed(self):
settings = get_project_settings()
mailer = MailSender.from_settings(settings)
return mailer.send(to=[“XXXX@gmail.com"], subject='subject2', body='body2')
In this case reactor/main loop shutdown process will wait.
You can see it from logs:
2019-06-02 20:00:20 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-02 20:00:22 [scrapy.mail] INFO: Mail sent OK: To=['XXXX@gmail.com'] Cc=[] Subject="subject2" Attachs=0
2019-06-02 20:00:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
2019-06-02 20:00:22 [scrapy.core.engine] INFO: Spider closed (finished)
My question to Scrapy owners, @Gallaecio, can we consider the workaround as a fix and change documentation for MailSender.send() ?
Or can someone continue digging into Twisted world and propose some more valuable adjustments for using Deferred in Scrapy?
beausoft commentedon Jun 11, 2019
i have the same issue when i use Scrapy send emaill , the email has been sent successfully.
Bone117 commentedon Jul 24, 2019
self._send_mail(body,subject).addCallback(lambda x: x)
CarterPape commentedon Feb 3, 2020
I can verify that this is still an issue.
The email goes through, but a fatal error gets thrown with the following traceback:
iveney commentedon Nov 22, 2020
I have an email pipeline that sends email during process_item and have the error
'NoneType' object has no attribute 'bio_read'
, e.g., something like:Changing the function to async and use await seems to solve it for me, as
mailer.send
returns thedeferred
object.Not sure if this is the right way to solve but it seems working for me.
brickyang commentedon May 20, 2021
Same issue here. @iveney 's solution works.
12 remaining items