讨论/技术交流/爬虫遇到的问题,求解答/
爬虫遇到的问题,求解答

from bs4 import BeautifulSoup
import requests

def gethtml(url):
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36"
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
return None

def parsehtml(html):
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())

if name == 'main':
url = "https://gleaming.cn/2021/02/13/Why-two-survivor-spaces/"
html = gethtml(url)
parsehtml(html)

这段爬虫代码运行起来会出http 418问题 ,但是我已经加上了headers,还有哪里有问题吗,求大神解答。
而且代码换成下面这样就可以得到结果,是requests的用法出错了吗
import urllib.request
from bs4 import BeautifulSoup

def getHtml(url):
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html,'lxml')
print(soup.prettify())

if name == 'main':
getHtml("https://gleaming.cn/2021/02/13/Why-two-survivor-spaces/")

共 2 个回复

第一次发,忘记在这里缩进了

你确定你是写python的嘛,看你这缩进我还以为是汇编语言