有些网页内容,需要登录才能访问。
模拟登录有好几种方法。
1. POST请求登录
1.1 找到URL、Content-Type、表单数据
(1)找到POST请求的URL
打开登录页面,打开Chrome 检查(右击网页空白处,点击【检查】),输入账号和密码,登录。
找到POST请求的URL,登录通常名称带有login,如下图所示:
(2)找到Content-Type
在请求标头处找到Content-Type,Content-Type指明了客户端向服务端发送了什么类型的数据。4种常见的POST内容类型是,
application/x-www-form-urlencoded
,Content-Type的默认值,提交的数据按照key1=val1&key2=val2&...
进行编码,key
和val
都进行了URL转码application/json
,告诉服务端消息主体是序列化后的JSON字符串text/xml
multipart/form-data
,通常用来上传文件
Content-Type: application/x-www-form-urlencoded
Content-Type: application/json;charset=UTF-8
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryrGKCBY7qhFd3TrwA
(3)找到表单数据
表单提交的数据在上图【载荷】上查看,比如:
userName=xxx&password=xxx&verifyCode=
表单提交的数据有用户名、密码(这里看到的密码多半是已经加密过的密码)和验证码。
1.2 获取验证码
上述表单提交需要验证码,
思路:通过Get请求,获取验证码图片,再通过文字识别,识别出图片上的字符。
(1)获取验证码图片
代码很简单,
url = 'https://b.leyaoyao.com/lyy/rest/group/distributor/getLoginVerifyCode'
session = requests.Session()
r = session.get(url)
with open('verify_code_example.jpg', 'wb') as f:
f.write(r.content)
(2)识别图片上的文字
文字识别库,目前有两款主流的开源框架Tesseract和EasyOCR,简单看了下,更推荐Tesseract。
1.2.1 Tesseract
安装pytesseract,
sudo apt install libtesseract-dev tesseract-ocr
pip3 install pytesseract
识别图片上的文字,
import pytesseract
from PIL import Image
text = pytesseract.image_to_string(Image.open('verify_code_example.jpg'))
遗憾的是,pytesseract不能识别上述的验证码,返回文本为空。
1.2.2 EasyOCR
安装EasyOCR,
pip3 install easyocr
使用easyocr,支持的语言列表见Supported Languages。
import easyocr
reader = easyocr.Reader(['ch_sim', 'en'], gpu=False)
text = reader.readtext('verify_code_example.jpg')
运行结果举例如下:
# python3 tmp.py
Using CPU. Note: This module is much faster with a GPU.
[([[0, 0], [55, 0], [55, 18], [0, 18]], 'FGSJU', 0.8078654978535195)]
reader.readtext
返回坐标、文本、置信度。
如果提示如下信息,表明需要下载识别模型,模型存放在home目录下.EasyOCR/model
,
Using CPU. Note: This module is much faster with a GPU.
Downloading recognition model, please wait. This may take several minutes depending upon your network connection.
注:第一次使用需要下载检测模型,如果下载很慢,可以从Jaided AI: EasyOCR model hub手动下载,解压后放到home目录下/.EasyOCR/model/
子目录下。
Jaided AI: EasyOCR documentation
1.3 模拟登录
知道了URL和数据,就可以使用Requests模块发送POST请求。模拟登录后用session保持登录状态。
session = requests.Session()
def login():
url = 'https://b.leyaoyao.com/lyy/rest/group/distributor/login'
param = {
'userName': '',
'password': ''
}
# Content-Type: application/x-www-form-urlencoded
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
}
response_login = session.post(url, headers=headers, data=param)
return response_login.status_code
Content-Type如果是application/json;charset=UTF-8
,调用json.dumps将数据序列化成JSON字符串json.dumps(param)
,
import json
# 得到场地收益
url = 'https://b.leyaoyao.com/lyy/rest/income/benefit/group'
# parameters
param = {
"startDate": "2022-08-17",
"endDate": "2022-08-17",
"pageIndex": 1,
"pageSize": 10,
"field": "all_amount",
"sortDirection": "desc",
"labels": []
}
response = session.post(url, data=json.dumps(param), headers=headers)
同时在header
指定'Content-type':'application/json'
,
headers = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 9.0; SAMSUNG SM-F900U Build/PPR1.180610.011) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0 .0 Mobile Safari/537.36',
'Content-type':'application/json'
}
1.4 返回结果处理
在预览中可以查看POST请求后的返回结果,
如果是json格式,可以用response.json()
返回一个JSON对象,
result = response.json()
举例,以乐摇摇为例,查看详细收益,
https://b.leyaoyao.com/lyy/rest/income/benefit/group
param = {
"startDate": "2022-08-17",
"endDate": "2022-08-17",
"pageIndex": 1,
"pageSize": 10,
"field": "all_amount",
"sortDirection": "desc",
"labels": []
}
# response.json
{"result":0,"description":"","data":{"page":1,"total":1,"pageSize":10,"maxPage":1,"times":0,"items":[{"amount":1100.00,"onlineAmount":0.00,"offlineAmount":559.00,"adAmount":0.00,"giftAmount":324.00,"gameAmount":0.00,"gameGiftAmount":null,"rdGiftAmount":0.00,"adCount":0,"coins":977,"onlineCoins":101,"offlineCoins":876,"actualCoins":2003,"giftQuantity":27,"gameGiftQuantity":0,"rdGiftQuantity":null,"redCoins":0,"payCoins":1232,"wechatPayCount":null,"alipayPayCount":null,"unionPayCount":null,"jdPayCount":null,"payTotalCount":295,"giftConsumptionWeight":0,"customServiceFee":0.00,"valueAddedServiceFee":0.00,"groupId":1145312,"groupName":"吴川","equipmentCount":43,"module":"4,5,1,2,3,6,","displaySortBenefit":"Y","isactive":"Y"}],"offset":0}}