> Who uses service providers like this? I use change detection to monitor all so...

pwdisswordfishs · 2026-06-18T03:15:07 1781752507

> I use change detection to monitor all sorts of websites for changes. Some of my favorite authors don't have RSS.

Have you considered offering, as penitence, a public feed to share the information that this process produces?

saltmate · 2026-06-18T05:31:39 1781760699

Did you ask them for an RSS feed? Lots of people are pretty reasonable for such requests if you write a nice email.

baby_souffle · 2026-06-18T23:20:02 1781824802

> Did you ask them for an RSS feed? Lots of people are pretty reasonable for such requests if you write a nice email.

Yep!

They're busy people or just don't feel the need to do anything beyond hit the "publish" button on their CMS and call it good and that's fine / why I have a robot to make an RSS for me :).

arianvanp · 2026-06-17T21:26:31 1781731591

The litmus test here is whether they support https://blog.cloudflare.com/introducing-pay-per-crawl/ out of the box or not

They do not.

baby_souffle · 2026-06-18T01:43:54 1781747034

I don't know who 'they' is here but that's not the point? I would bet that a decent chunk of scraping happens because there is no API (or other machine-focused interface, like RSS).

"pay to crawl" sounds like the absolutely laziest possible way that a particular site could bolt on an API.

mannanj · 2026-06-18T02:13:05 1781748785

has anyone been using this for success? wondering what kinds of pays they are getting. or are the crawlers just avoiding those sites.

inigyou · 2026-06-18T11:58:27 1781783907

"we don't negotiate with terrorists"

devmor · 2026-06-18T02:32:35 1781749955

This attitude, and by proxy this business are the epitome of selfish entitlement.

You state that you believe you deserve access to others’ resources, at their cost, despite their clear attempts to stop you from using them, simply because you want it.

fc417fc802 · 2026-06-18T04:40:37 1781757637

I'd counter that your attitude is a techno-authoritarian one. Why should anyone have any say over how I access and use a publicly available resource? At least so long as my actions don't directly cause technical problems for the service operator.

saltmate · 2026-06-18T05:34:11 1781760851

> At least so long as my actions don't directly cause technical problems for the service operator.

That's the point of the criticism. The praise of their anti-anti-bot features reads like it is commonly used to cause technical problems to the service providers, be it intended or accepted for the cause.

pocksuppet · 2026-06-18T13:00:20 1781787620

Anti-bot features are definitely used to cause technical problems to service providers you don't like.

devmor · 2026-06-18T11:54:07 1781783647

> At least so long as my actions don't directly cause technical problems for the service operator.

But they do.

The reasoning you’re describing is not altruistic. It’s the same reasoning used by every AI scraper.

It’s the very reason I am paying a couple hundred dollars out of my own pocket every month to keep the websites of hundreds of small businesses and hobbyists online while I try to help them move to bigger cloud hosts, when I used to turn a small profit from it.

fc417fc802 · 2026-06-18T13:25:05 1781789105

> The reasoning you’re describing is not altruistic. It’s the same reasoning used by every AI scraper.

I think that's bad faith on your part. Clearly AI scrapers are aware of what they are doing and simply don't care. The entire purpose of my including the bit you quoted there was to explicitly exclude that sort of behavior.

inigyou · 2026-06-18T11:57:10 1781783830

Maybe if you weren't using expensive anti-bot solutions people wouldn't use expensive bots.

devmor · 2026-06-18T12:38:01 1781786281

That’s a great theory, unfortunately it’s defeated by the fact that I didn’t need to use anti-bot solutions until I was charged for 38,000x my normal ingress traffic in a single month by bot traffic.

pocksuppet · 2026-06-18T13:00:35 1781787635

How much traffic was that?

hirsin · 2026-06-18T05:16:10 1781759770

Look, it wasn't _my_ request that made the server fall over, it must have been one of the other several thousand thoughtless scrapers running on the website that caused it to die.

fc417fc802 · 2026-06-18T06:40:49 1781764849

If you're claiming that the operators of high volume AI scrapers that wantonly disregard rate limits and all common sense are unethical then I'm right there with you. But that's not at all what was described upthread nor is it the only way in which bots get used by any stretch of the imagination.

As far as anti-bot countermeasures go I quite like proof of work solutions since those disproportionately impact high volume scrapers without noticeably impeding a small hobby project.

Unfortunately the operators of many major websites appear to want something akin to DRM with the excuse of bots used merely as window dressing.

ssl-3 · 2026-06-18T07:02:19 1781766139

There was a time when a person could walk through a few department stores every week (or even every day) just to take note of some prices along the way, and ultimately tabulate them to try to identify and snatch up the best deal once it happens.

And if everyone did this, it'd be a real problem. The stores would be clogged up by geeks writing notes in little books with Parker Jotters and just basically wasting space and taking up air conditioning while they sleuth out the best way to put the screws to the company for a few measly dollars.

That'd be awful.

But not many people ever did that in stores, and not many individual people are doing that today with the web. It's really not a problem.

(And if a website in 2026 can't stand the burn of several thousand personal scrapers that are operated by people who actually want to buy stuff from it, then maybe that system simply sucks and needs to be rethought.)

baby_souffle · 2026-06-18T23:27:29 1781825249

> There was a time when a person could walk through a few department stores every week (or even every day) just to take note of some prices along the way, and ultimately tabulate them to try to identify and snatch up the best deal once it happens.

This is how it started! I noticed certain things during my weekly shop that I did a double-take on and thought "wasn't that $cheaper last week!?". Took me ~ 45 min to figure out that the retailer actually has a really nice graphQL endpoint that powers the "view your previous receipts" function on their website. Of course they don't document this / make it available for 3rd parties... so scrape it is!

I wrote a bot to dump every receipt into a sqlite DB and I fire it up ~ weekly to pull down receipts that it doesn't have locally.

Turns out, not _everything_ has gotten more expensive @ my local grocery store over the past few years... just most things have :/.

> But not many people ever did that in stores,

There's a cottage-industry of firms out there that get gig-workers to pop in to $randomStore and take a picture of $randomItem on shelf w/ the price tag in the photo. The firms sell this info to stores that want to know how a competitor might be doing pricing / placing certain items on the more valuable shelf spots.

> and not many individual people are doing that today with the web. It's really not a problem.

That's my point! I scrape a few hundred pages per day across _many_ domains. My bots respect 429s and they have some other backoff/random-jitter strategies baked in to _not_ be the reason anti-scrape proliferates.

irjustin · 2026-06-18T03:06:46 1781752006

Personally I consider it fair game in "price wars".

Dynamic pricing designed to extract every penny out. Then why shouldn't I be allowed to monitor your pricing changes?

kjkjadksj · 2026-06-18T04:28:35 1781756915

You can already access these resources. What does it matter if you do the clicking or you have headless chrome do the clicking while you make a cup of coffee?

baby_souffle · 2026-06-18T13:44:40 1781790280

This is my entire point!

And it’s why scrapers will always win; absolutely worst case, I get a screenshot of the content and have to process it further.

Banditoz · 2026-06-18T04:05:05 1781755505

Dude, it's a web request. It's not that deep.

devmor · 2026-06-18T11:50:23 1781783423

Dude, as someone who runs web servers, my pockets are not that deep.

I’m struggling to keep the websites of hundreds of hobbyists and small businesses alive right now because of people like this.

inigyou · 2026-06-18T11:58:03 1781783883

How many requests per second are you getting?

giancarlostoro · 2026-06-18T05:12:20 1781759540

I mean, this is how Google was built.

cryptonym · 2026-06-18T13:00:17 1781787617

Not a fair statement. Google wasn't built on bypassing bot protections.

Google is providing a service to the websites they crawl.

They try to not crawl when we don't want them (robots.txt, clear user-agent, no-index no-follow...).

giancarlostoro · 2026-06-18T14:22:31 1781792551

> Google is providing a service to the websites they crawl.

Yeah they're building an LLM and making it pointless to visit the websites.

cryptonym · 2026-06-18T14:46:50 1781794010

Let's agree on "Google was providing a service". Current and future state can be questionable.

Btw you can still block it.