Jsoup 根据关键字抓取亚马逊自然排名

it2026-03-30 15

思路就是根据亚马逊的查询规则组装请求的ip 然后用jsoup进行模拟请求获取整个页面最后对整个页面的前端代码进行分析筛选出来自己想要的.

//核心代码

public void amazonData(Capture capture, List<Productpage> productpageList) { Map<String, String> map = new HashMap<String, String>(); List<String> siteList = OtherUtil.getSite(capture, map); for (int i = 0; i < siteList.size(); i++) { try { Thread.sleep((int) (Math.random() * 5000)); } catch (InterruptedException e) { e.printStackTrace(); } try { //siteList.get(i) 获取的是组装好的ip 模拟cookie 模拟谷歌浏览器 Document document = Jsoup.connect(siteList.get(i)).cookies(map).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36").get(); //根据前端代码里的规律筛选自己想要的 Elements elementsByAttribute = document.getElementsByAttribute("data-component-type"); int num = 1; for (int j = 0; j < elementsByAttribute.size(); j++) { String str = elementsByAttribute.get(j).toString(); // 这是获取一个页面的全部产品后续会对照数据库进行排名 if (str.startsWith("<div data-asin=") && str.indexOf("data-component-type=\"s-search-result\"") != -1) { Productpage productpage = new Productpage(); productpage.setAntistop(capture.getAntistop()); productpage.setSite(capture.getSite()); productpage.setDivDate(str); productpage.setPageNo((i + 1) + ""); productpage.setPm(num + ""); productpageList.add(productpage); num++; } } } catch (IOException e) { e.printStackTrace(); } } //英国站的组装 k = "关键字" i为页数这是只搜索前五页 map里面的是cookie,由于不懂怎么模拟浏览器输入地区编码用了个最蠢的写死后续肯定能够改. }

最新回复(0)

Jsoup 根据关键字 抓取亚马逊自然排名

Jsoup 根据关键字抓取亚马逊自然排名