-
Notifications
You must be signed in to change notification settings - Fork 203
Open
Description
我使用小米商城登录成功后,想爬取个人中心页面的用户信息,直接爬取网页,发现页面中用户信息是ajax加载,js模板渲染的
url = 'https://account.xiaomi.com/pass/serviceLoginAuth2?_dc=' + str(int(time.time() * 1000))
res = self.sess.post(url, data=data)
data = json.loads(res.content.decode('utf-8').replace('&&&START&&&', ''))
if data['desc'] == '成功':
pprint(data)
res2 = self.sess.get('https://www.mi.com/user/portal')
res2.encoding = 'utf-8'
print(res2.text)
soup = BeautifulSoup(res2.text, 'lxml')
img_tag = soup.select_one('div.portal-content-box div.user-card>img') # 无法获取,此处是js渲染的
print(img_tag['alt'], img_tag['src'])然后改成调用接口,响应状态码500,内容为请求来源不合法
url = 'https://account.xiaomi.com/pass/serviceLoginAuth2?_dc=' + str(int(time.time() * 1000))
res = self.sess.post(url, data=data)
data = json.loads(res.content.decode('utf-8').replace('&&&START&&&', ''))
if data['desc'] == '成功':
pprint(data)
self.sess.get('https://www.mi.com/user/portal')
self.sess.headers[quote(':authority')] = 'api2.service.order.mi.com'
self.sess.headers[quote(':method')] = 'GET'
self.sess.headers[quote(':path')] = '/user/userinfo'
self.sess.headers[quote(':scheme')] = 'https'
res2 = self.sess.get('https://api2.service.order.mi.com/user/userinfo') # 直接调用接口,状态码500
print(res2.text)
res2.encoding = 'utf-8'
data2 = json.loads(res2.text)
pprint(data2)请问以上问题怎么解决?
难道只能通过selenium+webdriver控制浏览器的方式爬取网页信息,那么做js加密登录的意义是啥?
希望可以相互交流下...
RussiaVk
Metadata
Metadata
Assignees
Labels
No labels

