From reading this launch post, I'm not convinced this is going to save too much ...

vladf · on Dec 13, 2022

They have a project which addresses this concern as well: https://skyplane.org/en/latest/benchmark.html

parasj · on Dec 13, 2022

I'm one of the creators of Skyplane. Skyplane can migrate large datasets between cloud regions at 10s of Gbps while compressing data to reduce egress fees. Happy to chime in!

https://github.com/skyplane-project/skyplane

ensemblehq · on Dec 13, 2022

Congrats on the launch! I had a similar idea once a few years back but failed to materialize it. You might want to consider other cloud providers like Sushi Cloud to get costs even lower. Happy to do an intro if it seems interesting.

bushbaba · on Dec 13, 2022

Or to leverage cheaper compute/energy when it’s available. https://www.crusoecloud.com/features/

covi · on Dec 13, 2022

I'm one of the creators of SkyPilot. Thanks for the thoughtful questions and let me try to take a stab:

SkyPilot is not just for multi clouds. It's useful for all of these scenarios:

- using a single region of one cloud

- using multiple regions of one cloud

- using multiple clouds

Data transfer between zones/regions within a cloud is much cheaper than across clouds. We see many users falling in the "one cloud" category and they frequently read 10s of TBs of data across regions to do ML training.

Finally, saving money is one of several key problems we aim to solve, and there are quite a few ways to save other than lots-of-compute-on-small-data. Other reasons why you may want to use a system like SkyPilot include

(1) improving resource availability (big pain point for GPUs/TPUs)

(2) use one interface and know that your jobs can migrate across regions or clouds

More rationale in the intro blog post: https://medium.com/@zongheng_yang/skypilot-ml-and-data-scien...

helsinkiandrew · on Dec 13, 2022

And isn’t the biggest issue with running potentially large jobs in the cloud the cut off when it’s cheaper to use your own hardware. After a few months or dozens of runs of your large model in the cloud you may have reached the point where purchasing would have been cheaper.

Something that could look at your code, data and budget and say upto X runs use cloud A, for more than Y runs it would be cheaper to buy/lease these GPUs etc. would be interesting.

YetAnotherNick · on Dec 13, 2022

I think it’s common to train 100s of models on the same data for experiments. Then you would only need to copy data once to all the cloud storage and run experiments as you wish.

Also most cloud provider don’t charge for ingress so you could move the data from something like R2 to cloud as many times you want..

covi · on Dec 13, 2022

+1. We've heard from some heavy users that Cloudfare R2 is saving them $$$ on egress costs: https://www.cloudflare.com/products/r2/

As outlined in the position paper (linked by another commenter) we believe such tailwinds are increasingly helping foster the "Sky" and making workloads moving between clouds much easier.