BYR Achieve · 镜像论坛

[问题]请教C#爬虫，爬取固定页面

2016/9/17镜像同步5 回复

本人菜鸟，用C#模拟浏览器请求某网站的ajax接口，服务器总是返回一部分数据之后关闭链接，用python请求几乎不用带多少头信息就可以取到完整的响应，想请教怎么能取到完整的服务器响应。我的C#代码如下：```C# String url = String.Format("http://www.flycua.com/otabooking/flight-search!doFlightSearch.shtml?rand={0}", new Random().NextDouble()); HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url); request.Method = "POST"; ArrayList byteList = new ArrayList(); request.KeepAlive = true; String postString = "searchCond={\"tripType\":\"OW\",\"adtCount\":1,\"chdCount\":0,\"infCount\":0,\"currency\":\"CNY\",\"sortType\":\"a\",\"segmentList\":[{\"deptCd\":\"CAN\",\"arrCd\":\"NAY\",\"deptDt\":\"2016-09-22\",\"deptCityCode\":\"CAN\",\"arrCityCode\":\"BJS\"}],\"sortExec\":\"a\",\"page\":\"0\"}"; Byte[] postBytes = Encoding.UTF8.GetBytes(postString); request.Host = "www.flycua.com"; request.ContentLength = postBytes.Length; request.Headers["Pragma"] = "no-cache"; request.Headers["Cache-Control"] = "no-cache"; request.Accept = "application/json, text/javascript, */*; q=0.01";// ok request.Headers["Accept-Encoding"] = "gzip, deflate"; request.Referer = @"http://www.flycua.com/flight2014/can-nay-160920_CNY.html"; request.Headers["Origin"] = "http://www.flycua.com"; request.Headers["X-Requested-With"] = "XMLHttpRequest"; request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"; request.Timeout = 10000; request.ReadWriteTimeout = 10000; request.ContentType = "application/x-www-form-urlencoded"; try { using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(postBytes, 0, postBytes.Length); } WebResponse response = request.GetResponse(); StringBuilder html = new StringBuilder(); using (Stream responseStream = response.GetResponseStream()) { Stream stream; if (response.Headers["Content-Encoding"] == "gzip") { stream = new GZipStream(responseStream, CompressionMode.Decompress); } else { stream = new DeflateStream(responseStream, CompressionMode.Decompress); } StreamReader responseStreamReader = new StreamReader(stream); html.Append(responseStreamReader.ReadToEnd()); } Console.WriteLine(html); } catch (Exception ex) { Console.WriteLine(ex.Message); } ``` python代码： ```Python from requests import post import random def get_url(): return "http://www.flycua.com/otabooking/flight-search!doFlightSearch.shtml?rand=".format(random.random()) def get_data(): # dept, arr, date data = { "searchCond": """{"tripType":"OW","adtCount":1,"chdCount":0,"infCount":0,"currency":"CNY","sortType":"a","segmentList":[{"deptCd":"CAN","arrCd":"NAY","deptDt":"2016-09-29","deptCityCode":"CAN","arrCityCode":"BJS"}],"sortExec":"a","page":"0"}""", } return post(get_url(), data).content.decode("utf-8") if __name__ == "__main__": print get_data() ```

订阅后，新回复会通过你的通知中心匿名送达。