Skip to content

Https下无法抓取只支持TLS1.2的站点 #701

Closed
@code4craft

Description

@code4craft

WebMagic默认的HttpClient只会用TLSv1去请求,对于某些只支持TLS1.2的站点(例如 https://juejin.im/) ,就会报错:

javax.net.ssl.SSLException: Received fatal alert: protocol_version
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:154)
	at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2023)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at us.codecraft.webmagic.downloader.HttpClientDownloader.download(HttpClientDownloader.java:85)

现在的修改方式是在HttpClientGenerator中构建SSLConnectionSocketFactory时加上支持。

Activity

added this to the WebMagic-0.8.0 milestone on Nov 29, 2017
added a commit that references this issue on Nov 29, 2017

#701 support to tls1.2

code4craft

code4craft commented on Nov 29, 2017

@code4craft
OwnerAuthor

更新会在0.7.4版本发布。

临时适配方式,修改HttpClientGenerator中的buildSSLConnectionSocketFactory方法,

return new SSLConnectionSocketFactory(createIgnoreVerifySSL(), new String[]{"SSLv3", "TLSv1", "TLSv1.1", "TLSv1.2"},
                    null,
                    new DefaultHostnameVerifier())

重写自己实现的HttpClientDownloader,并设置到Spider中。

z201

z201 commented on Mar 8, 2018

@z201

= = 我也遇到这个问题了。
org.apache.http.conn.ssl.SSLConnectionSocketFactory - Enabled protocols: [TLSv1]

ghost

ghost commented on Jun 4, 2018

@ghost

0.7.4到现在都还没发布??

henushang

henushang commented on Jun 7, 2018

@henushang

0.7.4 还没有发布~

z201

z201 commented on Jun 7, 2018

@z201

既然已经给方案了,大伙自己手动修改下呗。

liuyatao

liuyatao commented on Jun 20, 2018

@liuyatao

@z201 怎么修改?

DenisYin66

DenisYin66 commented on Dec 22, 2018

@DenisYin66

难受~~ 希望早日发布0.74版本

2 remaining items

Tfancy

Tfancy commented on Mar 6, 2019

@Tfancy

的确有点尬~第一次进来看,懵逼了。

hhh,我也是,然后看到0.74版本至今还没面世,我在想还要不要继续用这个爬,这位兄台还在?

duncan0428

duncan0428 commented on Mar 23, 2019

@duncan0428

官方网站 真的是非常不稳定天天挂~还没学到一半 就挂了 3次。。

MrMeng-hub

MrMeng-hub commented on Mar 24, 2019

@MrMeng-hub

官方网站 真的是非常不稳定天天挂~还没学到一半 就挂了 3次。。

什么官方网站?

duncan0428

duncan0428 commented on Mar 24, 2019

@duncan0428

官方网站 真的是非常不稳定天天挂~还没学到一半 就挂了 3次。。

什么官方网站?

webmagic.io 这个网站 一天断N次。。

Jasonandy

Jasonandy commented on Apr 27, 2019

@Jasonandy

坐等 0.74

duncan0428

duncan0428 commented on Apr 27, 2019

@duncan0428
Jasonandy

Jasonandy commented on May 6, 2019

@Jasonandy

还是下载源码包 自己编译处理的

shengdongli

shengdongli commented on May 20, 2019

@shengdongli

c

您好,按照您的方案我做出了修改后,在eclipse上运行成功,但在idea上运行依旧报错:Received fatal alert: protocol_version

wolaiye1010

wolaiye1010 commented on Dec 25, 2019

@wolaiye1010

master代码已经改了,只是mvn仓库的代码是老的,使用mvn报错,可以直接下载源码,重新打包,引入重新打的包

CarpCap

CarpCap commented on Apr 24, 2020

@CarpCap

maven仓库的更新了吗

ghost

ghost commented on Apr 27, 2020

@ghost

2020年都没更新0.74版本

scott17090025902

scott17090025902 commented on May 27, 2020

@scott17090025902

作者貌似因为有其他事情不维护这个了。解决方案已经给出了,自己下载源码后,修改HttpClientGenerator中的buildSSLConnectionSocketFactory方法,去掉TLSv1.3。再mvn clean install -DskipTests,就可以运行这个demo了

lihongji426

lihongji426 commented on Aug 4, 2020

@lihongji426

还更个鬼,python它不香吗

lomoye

lomoye commented on Sep 11, 2020

@lomoye

自己重写继承AbstractDownloader和HttpClientGenerator,修改HttpClientGenerator中的buildSSLConnectionSocketFactory方法,

return new SSLConnectionSocketFactory(createIgnoreVerifySSL(), new String[]{"SSLv3", "TLSv1", "TLSv1.1", "TLSv1.2"},
null,
new DefaultHostnameVerifier())

lubonain

lubonain commented on Nov 11, 2020

@lubonain

居然更新了?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @code4craft@liuyatao@henushang@scott17090025902@z201

        Issue actions

          Https下无法抓取只支持TLS1.2的站点 · Issue #701 · code4craft/webmagic