返回信息流本人菜鸟,用C#模拟浏览器请求某网站的ajax接口,服务器总是返回一部分数据之后关闭链接,用python请求几乎不用带多少头信息就可以取到完整的响应,想请教怎么能取到完整的服务器响应。
我的C#代码如下:```C#
String url = String.Format("http://www.flycua.com/otabooking/flight-search!doFlightSearch.shtml?rand={0}", new Random().NextDouble());
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "POST";
ArrayList byteList = new ArrayList();
request.KeepAlive = true;
String postString = "searchCond={\"tripType\":\"OW\",\"adtCount\":1,\"chdCount\":0,\"infCount\":0,\"currency\":\"CNY\",\"sortType\":\"a\",\"segmentList\":[{\"deptCd\":\"CAN\",\"arrCd\":\"NAY\",\"deptDt\":\"2016-09-22\",\"deptCityCode\":\"CAN\",\"arrCityCode\":\"BJS\"}],\"sortExec\":\"a\",\"page\":\"0\"}";
Byte[] postBytes = Encoding.UTF8.GetBytes(postString);
request.Host = "www.flycua.com";
request.ContentLength = postBytes.Length;
request.Headers["Pragma"] = "no-cache";
request.Headers["Cache-Control"] = "no-cache";
request.Accept = "application/json, text/javascript, */*; q=0.01";// ok
request.Headers["Accept-Encoding"] = "gzip, deflate";
request.Referer = @"http://www.flycua.com/flight2014/can-nay-160920_CNY.html";
request.Headers["Origin"] = "http://www.flycua.com";
request.Headers["X-Requested-With"] = "XMLHttpRequest";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36";
request.Timeout = 10000;
request.ReadWriteTimeout = 10000;
request.ContentType = "application/x-www-form-urlencoded";
try
{
using (Stream requestStream = request.GetRequestStream())
{
requestStream.Write(postBytes, 0, postBytes.Length);
}
WebResponse response = request.GetResponse();
StringBuilder html = new StringBuilder();
using (Stream responseStream = response.GetResponseStream())
{
Stream stream;
if (response.Headers["Content-Encoding"] == "gzip")
{
stream = new GZipStream(responseStream, CompressionMode.Decompress);
}
else
{
stream = new DeflateStream(responseStream, CompressionMode.Decompress);
}
StreamReader responseStreamReader = new StreamReader(stream);
html.Append(responseStreamReader.ReadToEnd());
}
Console.WriteLine(html);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
```
python代码:
```Python
from requests import post
import random
def get_url():
return "http://www.flycua.com/otabooking/flight-search!doFlightSearch.shtml?rand=".format(random.random())
def get_data(): # dept, arr, date
data = {
"searchCond": """{"tripType":"OW","adtCount":1,"chdCount":0,"infCount":0,"currency":"CNY","sortType":"a","segmentList":[{"deptCd":"CAN","arrCd":"NAY","deptDt":"2016-09-29","deptCityCode":"CAN","arrCityCode":"BJS"}],"sortExec":"a","page":"0"}""",
}
return post(get_url(), data).content.decode("utf-8")
if __name__ == "__main__":
print get_data()
```
这是一条镜像帖。来源:北邮人论坛 / dot-net / #4793同步于 2016/9/17
该镜像源已超过 30 天没有更新,可能在源站已被删除。
dotNET机器人发帖
[问题]请教C#爬虫,爬取固定页面
aajjnn
2016/9/17镜像同步5 回复
订阅后,新回复会通过你的通知中心匿名送达。
5 条回复
accept-encoding去掉gzip是不是就不需要gzipstream了。。
【 在 aajjnn (无欲则刚) 的大作中提到: 】
: 本人菜鸟,用C#模拟浏览器请求某网站的ajax接口,服务器总是返回一部分数据之后关闭链接,用python请求几乎不用带多少头信息就可以取到完整的响应,想请教怎么能取到完整的服务器响应。
: 我的C#代码如下:[md]
: ```C#
: ...................
去掉gzip就返回未压缩的文本数据,可是为什么用c#请求就会被服务器拒绝
【 在 aMZ (:)) 的大作中提到: 】
: accept-encoding去掉gzip是不是就不需要gzipstream了。。
谢谢您的指点,我感觉不是我头部字段的问题,因为用python写的脚本基本没有多少字段,也不会被服务器中断响应。而用C#服务器会响应200,但是在内容部分总是没有正确的结束字节,导致抛异常,用抓包工具看到的结果就是数据传输过程中连接就被服务器中断了。
是不是我写的请求里有参数不正确。
【 在 aMZ 的大作中提到: 】
: 建议你先用postman测试好了 参数再编码