Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲The Road to a Billion-Token Context (cacm.acm.org)

66 points by pseudolus 8 days ago | 53 comments

stephschie 5 days ago [-]

Hmm, I'm not convinced that is the direction we want to go in. It's not like we have all the context of everything we ever learned present when making decisions. Heck, even for CPUs and GPUs we have strict hierachy of L1,L2,L3 shared, caches to larger memory units with constant management of those. Feel free to surprise me, but I believe having a similar stack for LLMs is the better way to go where we will have short-term memory (system-prompt, prompt, task), mid-term memory (session-knowledge, preferences), long-term memory (project knowledge, tech/stack insights), intuition memory (stemming from language, physics, rules). But right now we haven't developed best-practices yet of what information should go into what layer at what times. Increasing the overall context window is nice, but IMHO won't help us much.

user2722 5 days ago [-]

I have a simple and brittle system to track people and facts and associations on Newspapers, which is basically: "LLM extract people, places/projects/structure/places and save them as an Obsidian compatible graph network."

For 2 or 3 newspapers it works; my idea was to use it as grounding to discover relationships between people, companies and jobs.

As for the "everyone's life", I have always assumed that there would be a graph system to point to "forgotten" documents.

Gemini said my idea was amazing and new in its implementation, even if not in spirit, but I'm assuming it was being sycophantic as usual.

altmanaltman 5 days ago [-]

I always find it better to ask LLMs why this is bad and to explain itself why it thinks so. Sometimes it might hallicunate stuff but forcing it to find out the negatives is better than asking it for opinion since i am guessing they found early in training that an agreeable LLM is better received than one which is constantly truthful and considers you to be pretty dumb.

johnmaguire 5 days ago [-]

> i am guessing they found early in training that an agreeable LLM is better received than one which is constantly truthful and considers you to be pretty dumb

My sense is that this is sort of accurate, but more likely it's a result of two things:

1. LLMs are still next-token predictors, and they are trained on texts of humans, which mostly collaborate. Staying on topic is more likely than diverging into a new idea.

2. LLMs are trained via RLHF which involves human feedback. Humans probably do prefer agreeable LLMs, which causes reinforcement at this stage.

So yes, kinda. But I'm not sure it's as clear-cut as "the researchers found humans prefer agreeableness and programmed it in."

user2722 5 days ago [-]

I usually ask it to give me two batches of answer: 1) normal; 2) not sycophantic.

asielen 5 days ago [-]

Interesting, how do you use that system? I could see it useful for genealogy research.

user2722 5 days ago [-]

Two ideas: grounding the LLM between different chats and researching politicians and their friends through mutual associations.

itissid 5 days ago [-]

But, in context learning could be better. One important thing here also is the ability to align on what to more/less pay attention to — no matter the Knowledge Base. These are the highest leverage points that need to be exposed to a human to think and reason over. Constrained/Guardrailed development tasks work fine*, But exploration new direction — vs exploiting local minimas — is still an achiles-heel, even with all these knowledge unless there is sufficient steering and exploration the minima-seeking "tries" hard to win.

* With Claude's 1-million context window I have been doing some slightly longer range tasks — ~1-3 days of work — with RPI/QRSPI frameworks(see last few days of comments else where on HN) in one context window. They involve a grill-me session with 20-60 sometimes more questions for tasks to get alignment which produces the design and the plan in one window.

johnmaguire 5 days ago [-]

> They involve a grill-me session with 20-60 sometimes more questions for tasks to get alignment which produces the design and the plan in one window.

My experience with this has been that it front-loads a lot of the LLM interactions, which can be exhausting without a reward (i.e. output.) And then, when I get the output, it's so large as to be hard to review/grok.

In other words, it feels a bit like when my coworker delivers me a month's worth of work in a single PR.

RugnirViking 5 days ago [-]

"It's not like we have all the context of everything we ever learned present when making decisions."

We don't, no. But wouldn't it be great if we did? I'd sure love to be able to hold the entirety of the code of my organisations monolith in my head at once. It would make everything so much easier. It would definitely also cut down on the bugs I write!

Similar if I could recall all of my organisations confluence pages. Id probably be a lot better at my job. Same with all the slack history. All the hr documents, press releases, meeting transcripts. Theres practically no end to useful context even just in text form, and even if much of it is not relevant to any one task, having all of it in working memory would be fantastic, if only it were possible. I could probably make incredible cross organisational efficiencies and probably be far wealthier if I were some savant that could hold all of this in my head at once.

I get that we have agent harnesses to try and fetch only the relevant information. But most of the problems result in either failures in this process, or previous things falling out of context. I very rarely see failures where the agent forgets stuff already in context. The harnesses are making up for this exact limitation!

palmotea 5 days ago [-]

> Similar if I could recall all of my organisations confluence pages. Id probably be a lot better at my job. Same with all the slack history. All the hr documents, press releases, meeting transcripts. Theres practically no end to useful context even just in text form, and even if much of it is not relevant to any one task, having all of it in working memory would be fantastic, if only it were possible. I could probably make incredible cross organisational efficiencies and probably be far wealthier if I were some savant that could hold all of this in my head at once.

That sounds like the beginning of a sci-fi story where the conclusion is forgetting is not such a bad thing.

iugtmkbdfil834 4 days ago [-]

Forgetting is useful ( currently trying to build a personal system that attempts that in some sensible way -- needless to say, eh, not exactly straightforward ) for a variety of reasons, but the idea that we should keep trying to keep what is necessary and discard unnecessary things holds merit.

webnrrd2k 5 days ago [-]

This is more me thinking out loud than a fully formed theory, but I suspect that a really large context might be useful when LLMs control more physical things. The huge context could be used to help encode the huge amount of implicit knowledge that ~4 billion years of evolution has crammed into our bodies. Plus all the junk we learned growing up, too. Stuff like vision processing, object permanence, all the unstated common-sense stuff humans are good at. Right now LLMs are used mostly for textual or data-processing tasks, but they will do more physical stuff, too.

It seems far more likely that it would all get baked-in to the LLM during training, but maybe it will turn out to be really useful to train up a "generic robot controller LLM" and pass in a huge number of tokens to better optimize it.

whateveracct 5 days ago [-]

but even if the context is big, the attention still has to sift through it

bhouston 5 days ago [-]

> Hmm, I'm not convinced that is the direction we want to go in. It's not like we have all the context of everything we ever learned present when making decisions.

I do not think it is the direction for everything.

Generally, we need consolidation of experiences and memories to just remember the important conclusions, ideas, and concepts, and then the ability to remember the full details if they are relevant (which they usually are not.)

But for some applications I am sure a billion token context would be useful.

It is likely most people need a 10 core CPU or whatever for most tasks, but for some applications you want a supercomputer with 1M cores.

lubujackson 5 days ago [-]

I think we are wending toward a solution here for context, because no matter how big a context window is, there needs to be a way to navigate and prioritize that context, a way to handle contadictory info, etc.

So we need a taxonomy, we need memory layers, we need summary/details. If there is one thing I have learned about how these LLMs work, if you give them a few flexible tools they can work the shit out of them to achieve objectives. We just need to right tools and right structure for context.

lumost 5 days ago [-]

Currently, it is difficult to live update the model’s parameters in response to new information. This difficulty applies at both an infrastructural level and an optimization level.

We simply don’t know how to incorporate new information without losing old capabilities reliably. Pans handle this through extensive evaluation, heuristics, and experience.

What we do know is that models can adapt to their context, and extending the context window is an infrastructure and capex problem first. A billion useful tokens would obviate the need for any out of band memory structures.

wat10000 5 days ago [-]

I definitely see why effort is being put into this. But it seems inherently limiting. It's like having someone sit down in a library each day with a notebook containing all their prior work, none of which they can actually remember. At the end of the day, they write out their notes, then go home and get their memory wiped for the next day. Making that notebook longer is an obvious way to improve the system, but it seems like it's going to bump into fundamental limits.

lumost 4 days ago [-]

Are we really any different? :) you could easily extend the above story to consolidated images/video/3d environments etc.

bjourne 5 days ago [-]

You can't compare context to memory. Context is simply all the text the LLM can use to generate a likely continuation. Imagine you're a relationship expert and I'm asking you for relationship advice. You don't know me so the best you can give me is "be yourself!" or "be confident!". It doesn't matter how good you are---lack of information about me is your limit. Now imagine you have a complete view of my dating history, including in-depth reviews from ex-girlfriends and whatnot. You could come up with some sharp and very fine-tuned advice just for me. Or maybe it still would be "be yourself!" cause dating advice is pseudoscience but you get my point.

Rendered at 23:27:40 GMT+0000 (Coordinated Universal Time) with Vercel.