8335 shaares
189 private links
189 private links
We can use robots.txt, but what should happen when this file is not respected?
I checked a few sites and this is just Google Chrome running on Windows 10. So they're using headless browsers to scrape content, ignoring robots.txt, and not sending their user agent string. I can't even block their IP ranges because it appears these headless browsers are not on their IP ranges.