More

joatmon-snoo · 2026-06-17T19:02:15 1781722935

An example I ran into recently: I wanted to scrape pricing data for used cars, to better inform a friend's decision about what to purchase.

I know there's a relationship between mileage and depreciation, but wanted to have a better sense of what that relationship is to know whether a given car was over or underpriced.

Similarly, if I was pulling that data to build a service of my own to offer to users... is that unethical?

sroussey · 2026-06-17T19:31:30 1781724690

All of these questions are easily answered by the question: can I run the bot on the same PC I use regularly? If so, then do it there. If not, then don’t do it at all.

pocksuppet · 2026-06-18T21:20:12 1781817612

This is often really good for your bot, because anti-bot providers are loathe to block what, as far as they know, is a residential CGNAT address. Sometimes you get more success scraping from home with Firefox or Chrome, than with an army of proxy networks.

fc417fc802 · 2026-06-18T04:32:56 1781757176

Why should technical capability to evade countermeasures dictate whether or not something is ethical? My view is that scraping remains ethical as long as your actions aren't causing technical problems for the operator. If anything, a retailer attempting to hide pricing data is what's unethical in my view.

adolph · 2026-06-17T21:40:00 1781732400

> scrape pricing data for used cars

Time was you could get lovely json feeds from every site by iterating the inspector curl statement. Now-a-days you can't even use Selenium without Cloudflare getting grouchy. Last fall had to make my spreadsheet like a cave-person control c, control v. It wouldn't be so bad if the dealer aggregators' coverage was xor, but you have to dedupe listings. Then there is the whole online salespeople who don't show up at the dealership.

inigyou · 2026-06-18T12:02:25 1781784145

There's a JavaScript property called navigator.webdriver that returns true if selenium is in use. Obviously, every antibot system checks it. Obviously, you can patch it to always say false.

joatmon-snoo · 2026-04-15T00:18:41 1776212321

It’s really one of those things that you have to see in person and walk up and down along to understand. The scale of it alone is a huge part of what makes it so memorable.

hermitcrab · 2026-04-15T13:39:58 1776260398

Similarly for Gericault's 'Raft of the Medusa'. I had only seen it in a book and had no idea how big it was. It is enormous. Go see it if you are ever at the Louvre (much more impressive than the Moca Lisa IMHO).

joatmon-snoo · 2026-04-15T00:16:19 1776212179

Well, is Pixar’s Toy Story a work of art? Or what about Julia set renderings, where people make choices about the colors? ;)

Tongue-in-cheek aside, I do think I agree with you in that (1) art, as perceived by us human meatbags, is art because of the human element of it (if not in creation, then in perception), and that (2) AI absent explicit steering trends towards a rather bland medium.

But there’s art in everything from the blurry, out of focus, disposable film cameras, to a 5-year-old’s crayon scribble scrabbles, to the neon glitter themes we used to copy-paste over our geocities and xanga pages, and as frustrating as it is to our own sensibilities, an AI prompt “draw a pink elephant” isn’t all that different.

slg · 2026-04-15T00:45:33 1776213933

The element of creation is central to art. A painting or a photo of a sunset can be art, but the sunset itself is not art.

In addition, the communication doesn't need to be explicit or intentional. It can be communicating something antithetical to the artist's original intent like a blurry and out of focus photo. Or it can even be antithetical to the piece itself like a lot of modern art (Fountain[1] comes to mind). I'm also sure that the 5-year-old will happily tell you a story about why they scribbled what they did. I'm not diminishing any of those. But if all the person contributes is a prompt, the text of that prompt is the extent of their art.

[1] - https://en.wikipedia.org/wiki/Fountain_(Duchamp)

Jare · 2026-04-15T07:30:21 1776238221

> a photo of a sunset can be art ... > But if all the person contributes is a prompt, the text of that prompt is the extent of their art.

The natural question to pose is, does that mean that when the person who pressed the shutter button in that camera, that button press was the extent of their art? Of course not; intent, sensibility, timing, understanding there's something special about what's in front of you, preparing a composition, orchestrating poses, framing to create a special composition, manipulating the medium via speed or exposure or etc to create an appropriate texture... all those and more can play a part, and the button press is just the delivery method.

Millions of photos per day are not art even if they show a pretty thing, and nobody has a problem with that. Even when we actively try to capture something special, most people will later look at their photo and say "it shows the place, but it doesn't communicate anything like what being there made me feel".

So in the same way, I think the interesting discussion will be not that AI images are not art. Millions of prompts per day will not be art, and nobody (except grifters) has a problem with that. But how can AI become another vehicle for people to produce interesting art? Perhaps there's nothing special there. But I hope people with the drive to explore and the need to communicate continue giving it a shot and prove or disprove the notion that "it's just a prompt".

balamatom · 2026-04-15T02:58:25 1776221905

Is building a technofeudal entity and bootstrapping pseudo-AGI art?

cal_dent · 2026-04-15T07:00:19 1776236419

But it completely is different. To you point its why a 5 year old's crayon scribble is more powerful to certain people than Guernica for instance. History is littered with gazillions of scribbles, stray notes, meaningless stuff that just goes straight in the bin. AI will do that. But for something communicates the feeeell of something you need warmth and emotional relatability.

joatmon-snoo · 2026-03-31T00:24:55 1774916695

That's called a compromised account, not Facebook sharing the email with a third party data broker.

halayli · 2026-03-31T02:52:14 1774925534

To be honest, what you did here is called speculation. Claims require evidence, and you provided none. Your confident tone is unjustified imo.

reaperducer · 2026-03-31T02:05:33 1774922733

Since all the alleged comments are allegedly from people he knew, and not new strangers, I find it hard to believe that someone has been impersonating him on Facebook for the last 15 years.

Especially people who were at his funeral.

joatmon-snoo · 2026-03-30T21:19:20 1774905560

Breaking something is easier than fixing it.

tptacek · 2026-03-30T21:22:05 1774905725

People have said that for decades and it wasn't true until recently.

joatmon-snoo · 2026-03-31T00:33:18 1774917198

Hmm: can you elaborate?

I've never been on a security-specific team, but it's always seemed to me that triggering a bug is, for the median issue, easier than fixing it, and I mentally extend that to security issues. This holds especially true if the "bug" is a question about "what is the correct behavior?", where the "current behavior of the system" is some emergent / underspecified consequence of how different features have evolved over time.

I know this is your career, so I'm wondering what I'm missing here.

tptacek · 2026-03-31T00:37:16 1774917436

It has generally been the case that (1) finding and (2) reliably exploiting vulnerabilities is much more difficult than patching them. In fact, patching them is often so straightforward that you can kill whole bug subspecies just by sweeping the codebase for the same pattern once you see a bug. You'd do that just sort of as a matter of course, without necessarily even qualifying the bugs you're squashing are exploitable.

As bugs get more complicated, that asymmetry has become less pronounced, but the complexity of the bugs (and their patches) is offset by the increased difficulty of exploiting them, which has become an art all its own.

LLMs sharply tilt that difficulty back to the defender.

saagarjha · 2026-03-31T08:20:42 1774945242

In a sense, breaking a vulnerability is easier than fixing it up to be an exploit.

underdeserver · 2026-03-30T21:32:56 1774906376

Specifically in software vulnerability research, you mean.

Fixing vulnerable code is usually trivial.

In the physical world breaking things is usually easier.

charcircuit · 2026-03-30T21:44:34 1774907074

A proper fix maybe. But LLMs can easily make it no longer exploitable in most cases.

stavros · 2026-03-31T10:09:24 1774951764

That's why you simply make the LLM part of the CI checks on PRs.

joatmon-snoo · 2026-03-27T03:21:37 1774581697

Humans + LLMs are really good at producing enough spam to overwhelm anything like this. There’s a reason curl bans LLM slop reports now.

joatmon-snoo · 2025-12-31T23:26:20 1767223580

The argument for not using electric sharpeners is that they (1) cut down the lifetime of your knife substantially and (2) they do a mediocre job of sharpening.

Mechanically, it's just high-abrasive motorized spinning discs at preset angles. So rather than getting a good edge by taking a few microns of material off by doing it manually, you get an OK edge by taking 0.2mm off at a time. (If 0.2mm doesn't sound like a lot, think about how many mm wide your knife is.)

---

I'm personally 50-50 on this advice: most people don't sharpen their knives at all, and I think people are better off getting 10 OK years out of a knife than 50 terrible years out of it.

I'm also not willing to learn how to use a whetstone, so I landed in the middle on this: https://worksharptools.com/products/precision-adjust-knife-s...

dgacmu · 2025-12-31T23:56:16 1767225376

I still sharpen my knives on a whetstone, but given the general cost trajectory of most manufactured items, I've decided that I'm okay if I wear out my knives. Buying a new chef's knife in 10 years is basically free on a per-day-of-use basis.

(I say that, but I'm still using knives that mostly range from 25-50 years old, but some didn't get sharpened enough when they belonged to our parents and grandparents.)

gtowey · 2026-01-01T01:17:01 1767230221

I landed on using a diamond stone with 300 grit and 1000 grit. Unlike whetstones they never need to be flattened. I just use one of those cheap plastic angle guides. After a bit of practice you will learn to hold the angle well enough. Finish with a leather strop and some polishing compound and I can keep my knives shaving-sharp with only a few minutes effort before I cook.

joatmon-snoo · 2025-12-29T12:08:37 1767010117

There's a lot of tooling built on static binaries:

- google-wide profiling: the core C++ team can collect data on how much of fleet CPU % is spent in absl::flat_hash_map re-bucketing (you can find papers on this publicly)

- crashdump telemetry

- dapper stack trace -> codesearch

Borg literally had to pin the bash version because letting the bash version float caused bugs. I can't imagine how much harder debugging L7 proxy issues would be if I had to follow a .so rabbit hole.

I can believe shrinking binary size would solve a lot of problems, and I can imagine ways to solve the .so versioning problem, but for every problem you mention I can name multiple other probable causes (eg was startup time really execvp time, or was it networked deps like FFs).

MaskRay · 2025-12-30T05:15:38 1767071738

We are missing tooling to partition a huge binary into a few larger shared objects.

As my https://maskray.me/blog/2023-05-14-relocation-overflow-and-c... (linked by author, thanks! But I maintain lld/ELF instead of "wrote" it - it's engineer work of many folks)

Quoting the relevant paragraphs below:

## Static linking

In this section, we will deviate slightly from the main topic to discuss static linking. By including all dependencies within the executable itself, it can run without relying on external shared objects. This eliminates the potential risks associated with updating dependencies separately.

Certain users prefer static linking or mostly static linking for the sake of deployment convenience and performance aspects:

* Link-time optimization is more effective when all dependencies are known. Providing shared object information during executable optimization is possible, but it may not be a worthwhile engineering effort.

* Profiling techniques are more efficient dealing with one single executable.

* The traditional ELF dynamic linking approach incurs overhead to support [symbol interposition](https://maskray.me/blog/2021-05-16-elf-interposition-and-bsy...).

* Dynamic linking involves PLT and GOT, which can introduce additional overhead. Static linking eliminates the overhead.

* Loading libraries in the dynamic loader has a time complexity `O(|libs|^2*|libname|)`. The existing implementations are designed to handle tens of shared objects, rather than a thousand or more.

Furthermore, the current lack of techniques to partition an executable into a few larger shared objects, as opposed to numerous smaller shared objects, exacerbates the overhead issue.

In scenarios where the distributed program contains a significant amount of code (related: software bloat), employing full or mostly static linking can result in very large executable files. Consequently, certain relocations may be close to the distance limit, and even a minor disruption (e.g. add a function or introduce a dependency) can trigger relocation overflow linker errors.

jcalvinowens · 2025-12-30T18:47:56 1767120476

> We are missing tooling to partition a huge binary into a few larger shared objects

Those who do not understand dynamic linking are doomed to reinvent it.

Filligree · 2025-12-29T13:36:11 1767015371

There’s no way my proxy binary actually requires 25GB of code, or even the 3GB it is. Sounds to me like the answer is a tree shaker.

Sesse__ · 2025-12-29T14:00:35 1767016835

Google implemented the C++ equivalent of a tree shaker in their build system around 2009.

setheron · 2025-12-29T16:06:09 1767024369

the front-end services to be "fast" AFAIK probably include nearly all the services you need to avoid hops -- so you can't really shake that much away.

joatmon-snoo · 2025-12-21T19:55:39 1766346939

(author here) To be more specific, here's a benchmark that we ran last year, where we compared schema-aligned parsing against constrained decoding (then called "Function Calling (Strict)", the orange ƒ): https://boundaryml.com/blog/sota-function-calling

skybrian · 2025-12-21T23:14:41 1766358881

I wonder what it would look if you redid the benchmarks, testing against models that have reasoning effort set to various values. Maybe structured output is only worse if the model isn't allowed to do reasoning first?

joatmon-snoo · 2025-11-30T03:49:05 1764474545

This setting is new and was introduced in response to the first round of shai hulud attacks.