An example I ran into recently: I wanted to scrape pricing data for used cars, to better inform a friend's decision about what to purchase.
I know there's a relationship between mileage and depreciation, but wanted to have a better sense of what that relationship is to know whether a given car was over or underpriced.
Similarly, if I was pulling that data to build a service of my own to offer to users... is that unethical?
All of these questions are easily answered by the question: can I run the bot on the same PC I use regularly? If so, then do it there. If not, then don’t do it at all.
This is often really good for your bot, because anti-bot providers are loathe to block what, as far as they know, is a residential CGNAT address. Sometimes you get more success scraping from home with Firefox or Chrome, than with an army of proxy networks.
Why should technical capability to evade countermeasures dictate whether or not something is ethical? My view is that scraping remains ethical as long as your actions aren't causing technical problems for the operator. If anything, a retailer attempting to hide pricing data is what's unethical in my view.
Time was you could get lovely json feeds from every site by iterating the inspector curl statement. Now-a-days you can't even use Selenium without Cloudflare getting grouchy. Last fall had to make my spreadsheet like a cave-person control c, control v. It wouldn't be so bad if the dealer aggregators' coverage was xor, but you have to dedupe listings. Then there is the whole online salespeople who don't show up at the dealership.
There's a JavaScript property called navigator.webdriver that returns true if selenium is in use. Obviously, every antibot system checks it. Obviously, you can patch it to always say false.
It’s really one of those things that you have to see in person and walk up and down along to understand. The scale of it alone is a huge part of what makes it so memorable.
Similarly for Gericault's 'Raft of the Medusa'. I had only seen it in a book and had no idea how big it was. It is enormous. Go see it if you are ever at the Louvre (much more impressive than the Moca Lisa IMHO).
Well, is Pixar’s Toy Story a work of art? Or what about Julia set renderings, where people make choices about the colors? ;)
Tongue-in-cheek aside, I do think I agree with you in that (1) art, as perceived by us human meatbags, is art because of the human element of it (if not in creation, then in perception), and that (2) AI absent explicit steering trends towards a rather bland medium.
But there’s art in everything from the blurry, out of focus, disposable film cameras, to a 5-year-old’s crayon scribble scrabbles, to the neon glitter themes we used to copy-paste over our geocities and xanga pages, and as frustrating as it is to our own sensibilities, an AI prompt “draw a pink elephant” isn’t all that different.
The element of creation is central to art. A painting or a photo of a sunset can be art, but the sunset itself is not art.
In addition, the communication doesn't need to be explicit or intentional. It can be communicating something antithetical to the artist's original intent like a blurry and out of focus photo. Or it can even be antithetical to the piece itself like a lot of modern art (Fountain[1] comes to mind). I'm also sure that the 5-year-old will happily tell you a story about why they scribbled what they did. I'm not diminishing any of those. But if all the person contributes is a prompt, the text of that prompt is the extent of their art.
> a photo of a sunset can be art
...
> But if all the person contributes is a prompt, the text of that prompt is the extent of their art.
The natural question to pose is, does that mean that when the person who pressed the shutter button in that camera, that button press was the extent of their art? Of course not; intent, sensibility, timing, understanding there's something special about what's in front of you, preparing a composition, orchestrating poses, framing to create a special composition, manipulating the medium via speed or exposure or etc to create an appropriate texture... all those and more can play a part, and the button press is just the delivery method.
Millions of photos per day are not art even if they show a pretty thing, and nobody has a problem with that. Even when we actively try to capture something special, most people will later look at their photo and say "it shows the place, but it doesn't communicate anything like what being there made me feel".
So in the same way, I think the interesting discussion will be not that AI images are not art. Millions of prompts per day will not be art, and nobody (except grifters) has a problem with that. But how can AI become another vehicle for people to produce interesting art? Perhaps there's nothing special there. But I hope people with the drive to explore and the need to communicate continue giving it a shot and prove or disprove the notion that "it's just a prompt".
But it completely is different. To you point its why a 5 year old's crayon scribble is more powerful to certain people than Guernica for instance. History is littered with gazillions of scribbles, stray notes, meaningless stuff that just goes straight in the bin. AI will do that. But for something communicates the feeeell of something you need warmth and emotional relatability.
Since all the alleged comments are allegedly from people he knew, and not new strangers, I find it hard to believe that someone has been impersonating him on Facebook for the last 15 years.
I've never been on a security-specific team, but it's always seemed to me that triggering a bug is, for the median issue, easier than fixing it, and I mentally extend that to security issues. This holds especially true if the "bug" is a question about "what is the correct behavior?", where the "current behavior of the system" is some emergent / underspecified consequence of how different features have evolved over time.
I know this is your career, so I'm wondering what I'm missing here.
It has generally been the case that (1) finding and (2) reliably exploiting vulnerabilities is much more difficult than patching them. In fact, patching them is often so straightforward that you can kill whole bug subspecies just by sweeping the codebase for the same pattern once you see a bug. You'd do that just sort of as a matter of course, without necessarily even qualifying the bugs you're squashing are exploitable.
As bugs get more complicated, that asymmetry has become less pronounced, but the complexity of the bugs (and their patches) is offset by the increased difficulty of exploiting them, which has become an art all its own.
LLMs sharply tilt that difficulty back to the defender.
The argument for not using electric sharpeners is that they (1) cut down the lifetime of your knife substantially and (2) they do a mediocre job of sharpening.
Mechanically, it's just high-abrasive motorized spinning discs at preset angles. So rather than getting a good edge by taking a few microns of material off by doing it manually, you get an OK edge by taking 0.2mm off at a time. (If 0.2mm doesn't sound like a lot, think about how many mm wide your knife is.)
---
I'm personally 50-50 on this advice: most people don't sharpen their knives at all, and I think people are better off getting 10 OK years out of a knife than 50 terrible years out of it.
I still sharpen my knives on a whetstone, but given the general cost trajectory of most manufactured items, I've decided that I'm okay if I wear out my knives. Buying a new chef's knife in 10 years is basically free on a per-day-of-use basis.
(I say that, but I'm still using knives that mostly range from 25-50 years old, but some didn't get sharpened enough when they belonged to our parents and grandparents.)
I landed on using a diamond stone with 300 grit and 1000 grit. Unlike whetstones they never need to be flattened. I just use one of those cheap plastic angle guides. After a bit of practice you will learn to hold the angle well enough. Finish with a leather strop and some polishing compound and I can keep my knives shaving-sharp with only a few minutes effort before I cook.
There's a lot of tooling built on static binaries:
- google-wide profiling: the core C++ team can collect data on how much of fleet CPU % is spent in absl::flat_hash_map re-bucketing (you can find papers on this publicly)
- crashdump telemetry
- dapper stack trace -> codesearch
Borg literally had to pin the bash version because letting the bash version float caused bugs. I can't imagine how much harder debugging L7 proxy issues would be if I had to follow a .so rabbit hole.
I can believe shrinking binary size would solve a lot of problems, and I can imagine ways to solve the .so versioning problem, but for every problem you mention I can name multiple other probable causes (eg was startup time really execvp time, or was it networked deps like FFs).
In this section, we will deviate slightly from the main topic to discuss static linking.
By including all dependencies within the executable itself, it can run without relying on external shared objects.
This eliminates the potential risks associated with updating dependencies separately.
Certain users prefer static linking or mostly static linking for the sake of deployment convenience and performance aspects:
* Link-time optimization is more effective when all dependencies are known. Providing shared object information during executable optimization is possible, but it may not be a worthwhile engineering effort.
* Profiling techniques are more efficient dealing with one single executable.
* Dynamic linking involves PLT and GOT, which can introduce additional overhead. Static linking eliminates the overhead.
* Loading libraries in the dynamic loader has a time complexity `O(|libs|^2*|libname|)`. The existing implementations are designed to handle tens of shared objects, rather than a thousand or more.
Furthermore, the current lack of techniques to partition an executable into a few larger shared objects, as opposed to numerous smaller shared objects, exacerbates the overhead issue.
In scenarios where the distributed program contains a significant amount of code (related: software bloat), employing full or mostly static linking can result in very large executable files.
Consequently, certain relocations may be close to the distance limit, and even a minor disruption (e.g. add a function or introduce a dependency) can trigger relocation overflow linker errors.
(author here) To be more specific, here's a benchmark that we ran last year, where we compared schema-aligned parsing against constrained decoding (then called "Function Calling (Strict)", the orange ƒ): https://boundaryml.com/blog/sota-function-calling
I wonder what it would look if you redid the benchmarks, testing against models that have reasoning effort set to various values. Maybe structured output is only worse if the model isn't allowed to do reasoning first?
I know there's a relationship between mileage and depreciation, but wanted to have a better sense of what that relationship is to know whether a given car was over or underpriced.
Similarly, if I was pulling that data to build a service of my own to offer to users... is that unethical?
reply