Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Train Your Own LLM from Scratch (github.com)

474 points by kristianpaul 5 days ago | 50 comments

jvican 5 days ago [-]

If you're interested in this resource, I highly recommend checking out Stanford's CS336 class. It covers all this curriculum in a lot more depth, introduces you into a lot of theoretical aspects (scaling laws, intuitions) and systems thinking (kernel optimization/profiling). For this, you have to do the assignments, of course... https://cs336.stanford.edu/

the_real_cher 5 days ago [-]

how does one get the lectures? I don't see the option for any lectures.

eftychis 4 days ago [-]

https://github.com/stanford-cs336/lectures

azangru 4 days ago [-]

One goes to youtube and searches for cs336?

y42 4 days ago [-]

shameless plug:

A series of Jupyter notebooks explaining the whole machine learning mechanism, from the beginning

https://github.com/nickyreinert/DeepLearning-with-PyTorch-fr...

and of course also how to build an llm from scratch

https://github.com/nickyreinert/basic-llm-with-pytorch/blob/...

kriro 4 days ago [-]

I did it back in the day when fast.ai was relatively new with ULMFiT. This must have been when Bert was sota. The architecture allows you to train a base and specialize with a head. I used the entire Wikipedia for the base and then some GBs of tweets I had collected through the firehouse. I had access to a lab with 20 game dev computers. Must have been roughly GTX 2080s. One training cycle took about half a day for the tokenized Wikipedia so I hyper parameter tuned by running one different setting on each computer and then moving on with the winner as the starting point for the next day. It was always fun to come to work the next morning and check the results.

The engineering was horrible and very ad-hoc but I learned a lot. Results were ok-ish (I classified tweets) but it gave me a good perspective on the sheer GPU power (and engineering challenges) one would need to do this seriously. I didn't fully grasp the potential of generating output but spent quite some time chuckling at generated tweets (was just curious to try it).

JoeDaDude 4 days ago [-]

Coincidentally, I just started on Build a Large Language Model (From Scratch), a repo/book/course by Sebastian Raschka [0][1][2]. Maybe it is a good problem to have to have to decide which learning resource to use.

[0] https://github.com/rasbt/LLMs-from-scratch

[1] https://www.manning.com/books/build-a-large-language-model-f...

[2] https://magazine.sebastianraschka.com/p/coding-llms-from-the...

gchadwick 4 days ago [-]

I really enjoyed the book. Great for people who want to understand the real nuts and bolts and have worked examples of all of the calculations.

Rendered at 17:41:14 GMT+0000 (Coordinated Universal Time) with Vercel.