- 1.Trinity large is very sparse (400B-A13B, 256 experts w/ 4 active per token).›Cameron Wolfe (Researcher at Netflix) · LLM score 85 · about 7 hours ago
- 2.We got Claude to teach open models how to write CUDA kernels.›Ben Burtenshaw (Hugging Face Researcher) · LLM score 85 · about 24 hours ago
- 3.BIG new idea in interpretability called Patterning›alphaXiv · LLM score 75 · 1 day ago
- 4.@charles_irl In case it’s not clear in the docs: - Ancestor https://t.co/v4FOLUBHz9’s are loaded into context automatically on startup›Boris Cherny (Creator of Claude Code) · LLM score 20 · 1 day ago
- 5."LLM-in-Sandbox Elicits General Agentic Intelligence"›alphaXiv · LLM score 85 · 1 day ago
- 6.@unsorsodicorda @saurabh_shah2 @Tim_Dettmers GA is trained using 4.5 Air as the teacher, whereas SERA-32B uses GLM 4.6.›Ethan Shen (Ai2 Researcher) · LLM score 85 · 1 day ago
- 7.his work was mostly the genius of Ethan Shen.›Tim Dettmers (Research Scientist at Ai2) · LLM score 70 · 2 days ago
- 8.Our method became so efficient (26x vs RL; 57x vs other synth gen), that we could easily generate 1000s of trajectories for a single repo.›Tim Dettmers (Research Scientist at Ai2) · LLM score 85 · 2 days ago
- 9.This is very impactful: you can now distill frontier performance into small models that are specialized to private repositories.›Tim Dettmers (Research Scientist at Ai2) · LLM score 75 · 2 days ago
- 10.From there we could do a massiv amounts of experiments and really understand what matters for training coding agents.›Tim Dettmers (Research Scientist at Ai2) · LLM score 80 · 2 days ago
- 11.Continual learning is a popular topic in LLM research, but it might not be as far away as we think.›Cameron Wolfe (Researcher at Netflix) · LLM score 65 · 2 days ago
- 12.Finally, we conduct an analysis of variance across SWE-Bench runs.›Ethan Shen (Ai2 Researcher) · LLM score 95 · 2 days ago
- 13.We also experiment with mixing data from rollouts and discover that training on SERA’s first and second rollouts complement each other to increase performance in data constrained regimes.›Ethan Shen (Ai2 Researcher) · LLM score 85 · 2 days ago
- 14.Even more interesting is the fact that the best data comes from truncated trajectories that just barely exceed the context limit.›Ethan Shen (Ai2 Researcher) · LLM score 85 · 2 days ago
- 15.Community-built open benchmarks work really well, e.g., Terminal-Bench, HLE, MMTEB.›Niklas Muennighoff (AI Researcher at Stanford) · LLM score 80 · 2 days ago
- 16.Had to cut this one for space: 2019: AI can't create art—creativity is uniquely humanNoam Brown (OpenAI Research Scientist) · LLM score 20 · 3 days ago
- 17.@0xabi96 It feels like I’m cheating.›Andrej Karpathy (AI researcher) · LLM score 70 · 3 days ago
- 18.@ChiragLathiya The nearest neighbor really is some kind of a junior engineer.›Andrej Karpathy (AI researcher) · LLM score 80 · 3 days ago
- 19.@jeremytwei Love the word "comprehension debt", haven't encountered it so far, it's very accurate.›Andrej Karpathy (AI researcher) · LLM score 30 · 3 days ago
- 20.@airesearch12 💯 @ Spec-driven development It's the limit of imperative -> declarative transition, basically being declarative entirely.›Andrej Karpathy (AI researcher) · LLM score 60 · 3 days ago
- 21.A few random notes from claude coding quite a bit last few weeks.›Andrej Karpathy (AI researcher) · LLM score 20 · 3 days ago
- 22.1987: AI can't win at chess—planning is uniquely human›Noam Brown (OpenAI Research Scientist) · LLM score 70 · 3 days ago
- 23.Had a great discussion on AI's major societal impacts and emerging risks on the @RestIsPolitics podcast this past November.›Yoshua Bengio · LLM score 20 · 3 days ago
- 24.Success in the Information Age was about being able to answer questions.›Jonathan Ross (TPU Creator) · LLM score 20 · 3 days ago
- 25.this is a blog post on claude + llama.cpp https://t.co/yej6WsNnQABen Burtenshaw (Hugging Face Researcher) · LLM score 20 · 3 days ago
- 26.
- 27.I know it's massively off trend, but I love setting up a nice quiver of models in @cursor_ai, reviewing generated tokens in the legit ui, switching models for certain parts, and just generally getting shit done.›Ben Burtenshaw (Hugging Face Researcher) · LLM score 30 · 3 days ago
- 28.I don't think people have realized how crazy the results are from this new TTT + RL paper from Stanford/Nvidia.›Ronak Malde (DeepMind Researcher) · LLM score 85 · 3 days ago
- 29.Continual learning is being positioned as a prerequisite for AGI (i.e., general systems must be adaptable).›Cameron Wolfe (Researcher at Netflix) · LLM score 70 · 4 days ago
- 30.I just watched a really great conversation about the future of AI.›Geoffrey Hinton · LLM score 30 · 4 days ago