1.Preparation
$ sudo apt-get install torsocks
and if you'd like follows ...
$ sudo apt-get install chromium-browser
$ sudo apt install curl
# -----------------------------------------------------
2.Usage
$ /usr/bin/chromium-browser --headless --disable-gpu --dump-dom --proxy-server="socks5://localhost:9050" --host-resolver-rules="MAP * ~NOTFOUND , EXCLUDE localhost" http://www.ugtop.com/ > index.html
... in case of JavaScript is running.
otherwise, the followings would be easier ...
$ torify wget -qO- -U curl http://www.ugtop.com > index.html
$ curl -s --socks5-hostname 127.0.0.1:9050 http://www.ugtop.com > index.html
# -----------------------------------------------------
3.Changing source IP on each access
$ sudo systemctl restart tor
or
$ sudo /etc/init.d/tor reload
# -----------------------------------------------------
4.Reference
・Configuring a SOCKS proxy server in Chromehttp://www.chromium.org/developers/design-documents/network-stack/socks-proxy
・How to use Wget with Tor Bundle in Linux
https://superuser.com/questions/404732/how-to-use-wget-with-tor-bundle-in-linux
・俺的備忘録 〜なんかいろいろ〜 (2017/07/13)
https://orebibou.com/2017/07/ubuntu-server-16-04-ltscentos-7%E3%81%ABtor%E3%82%92%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB%E3%81%97%E3%81%A6tor%E7%B5%8C%E7%94%B1%E3%81%A7curl%E3%82%84wget%E3%82%92%E4%BD%BF%E7%94%A8/... and so on ...
・ヘッドレス Chrome ことはじめ
https://developers.google.com/web/updates/2017/04/headless-chrome
・How to take a screenshot of a page N seconds after page is loaded with Chrome Headless? (2017/08/16)
https://superuser.com/questions/1209741/how-to-take-a-screenshot-of-a-page-n-seconds-after-page-is-loaded-with-chrome-he
・Make Chrome Headless to Wait for Ajax Before Printing to PDF (2018/04/12)
https://stackoverflow.com/questions/49614437/make-chrome-headless-to-wait-for-ajax-before-printing-to-pdf
--virtual-time-budget や --delay の指定をするも、期待したものにならず、後で以下も試してみるかも
・GoogleChrome/puppeteer (2017/08/17)
https://github.com/GoogleChrome/puppeteer/issues/338
・Puppeteerがクローリングに使えるかも.md (2017/12/16)
https://gist.github.com/sys9kdr/477d4c44b51c722331951c3f4b0b0c13
No comments:
Post a Comment