I use change detection to monitor all sorts of websites for changes. Some of my favorite authors don't have RSS.
I always set up price monitoring for any big ticket item I'm considering like appliances so I can see how their pricing changes over time.
I also use scrapers for websites that don't have an API. I like having all of my purchase history indexed in a database where I can do analysis.
> These kinds of services inevitably make the web more human-hostile and expensive.
I would rather not have to spend more time circumventing stupid bot detection things. I would be more than happy to pay for access to some of this data that I cannot access any other way.. but sure, let's keep burning resources on a cat and mouse game that scrapers will always be able to win.
> Did you ask them for an RSS feed? Lots of people are pretty reasonable for such requests if you write a nice email.
Yep!
They're busy people or just don't feel the need to do anything beyond hit the "publish" button on their CMS and call it good and that's fine / why I have a robot to make an RSS for me :).
I don't know who 'they' is here but that's not the point? I would bet that a decent chunk of scraping happens because there is no API (or other machine-focused interface, like RSS).
"pay to crawl" sounds like the absolutely laziest possible way that a particular site could bolt on an API.
This attitude, and by proxy this business are the epitome of selfish entitlement.
You state that you believe you deserve access to others’ resources, at their cost, despite their clear attempts to stop you from using them, simply because you want it.
I'd counter that your attitude is a techno-authoritarian one. Why should anyone have any say over how I access and use a publicly available resource? At least so long as my actions don't directly cause technical problems for the service operator.
> At least so long as my actions don't directly cause technical problems for the service operator.
That's the point of the criticism. The praise of their anti-anti-bot features reads like it is commonly used to cause technical problems to the service providers, be it intended or accepted for the cause.
> At least so long as my actions don't directly cause technical problems for the service operator.
But they do.
The reasoning you’re describing is not altruistic. It’s the same reasoning used by every AI scraper.
It’s the very reason I am paying a couple hundred dollars out of my own pocket every month to keep the websites of hundreds of small businesses and hobbyists online while I try to help them move to bigger cloud hosts, when I used to turn a small profit from it.
> The reasoning you’re describing is not altruistic. It’s the same reasoning used by every AI scraper.
I think that's bad faith on your part. Clearly AI scrapers are aware of what they are doing and simply don't care. The entire purpose of my including the bit you quoted there was to explicitly exclude that sort of behavior.
That’s a great theory, unfortunately it’s defeated by the fact that I didn’t need to use anti-bot solutions until I was charged for 38,000x my normal ingress traffic in a single month by bot traffic.
Look, it wasn't _my_ request that made the server fall over, it must have been one of the other several thousand thoughtless scrapers running on the website that caused it to die.
If you're claiming that the operators of high volume AI scrapers that wantonly disregard rate limits and all common sense are unethical then I'm right there with you. But that's not at all what was described upthread nor is it the only way in which bots get used by any stretch of the imagination.
As far as anti-bot countermeasures go I quite like proof of work solutions since those disproportionately impact high volume scrapers without noticeably impeding a small hobby project.
Unfortunately the operators of many major websites appear to want something akin to DRM with the excuse of bots used merely as window dressing.
There was a time when a person could walk through a few department stores every week (or even every day) just to take note of some prices along the way, and ultimately tabulate them to try to identify and snatch up the best deal once it happens.
And if everyone did this, it'd be a real problem. The stores would be clogged up by geeks writing notes in little books with Parker Jotters and just basically wasting space and taking up air conditioning while they sleuth out the best way to put the screws to the company for a few measly dollars.
That'd be awful.
But not many people ever did that in stores, and not many individual people are doing that today with the web. It's really not a problem.
(And if a website in 2026 can't stand the burn of several thousand personal scrapers that are operated by people who actually want to buy stuff from it, then maybe that system simply sucks and needs to be rethought.)
> There was a time when a person could walk through a few department stores every week (or even every day) just to take note of some prices along the way, and ultimately tabulate them to try to identify and snatch up the best deal once it happens.
This is how it started! I noticed certain things during my weekly shop that I did a double-take on and thought "wasn't that $cheaper last week!?". Took me ~ 45 min to figure out that the retailer actually has a really nice graphQL endpoint that powers the "view your previous receipts" function on their website. Of course they don't document this / make it available for 3rd parties... so scrape it is!
I wrote a bot to dump every receipt into a sqlite DB and I fire it up ~ weekly to pull down receipts that it doesn't have locally.
Turns out, not _everything_ has gotten more expensive @ my local grocery store over the past few years... just most things have :/.
> But not many people ever did that in stores,
There's a cottage-industry of firms out there that get gig-workers to pop in to $randomStore and take a picture of $randomItem on shelf w/ the price tag in the photo. The firms sell this info to stores that want to know how a competitor might be doing pricing / placing certain items on the more valuable shelf spots.
> and not many individual people are doing that today with the web. It's really not a problem.
That's my point! I scrape a few hundred pages per day across _many_ domains. My bots respect 429s and they have some other backoff/random-jitter strategies baked in to _not_ be the reason anti-scrape proliferates.
You can already access these resources. What does it matter if you do the clicking or you have headless chrome do the clicking while you make a cup of coffee?
I use change detection to monitor all sorts of websites for changes. Some of my favorite authors don't have RSS. I always set up price monitoring for any big ticket item I'm considering like appliances so I can see how their pricing changes over time. I also use scrapers for websites that don't have an API. I like having all of my purchase history indexed in a database where I can do analysis.
> These kinds of services inevitably make the web more human-hostile and expensive.
I would rather not have to spend more time circumventing stupid bot detection things. I would be more than happy to pay for access to some of this data that I cannot access any other way.. but sure, let's keep burning resources on a cat and mouse game that scrapers will always be able to win.