- 241.Julian was co-first author on AlphaGo, AlphaZero, and MuZero.›Noam Brown (OpenAI Research Scientist) · LLM score 40 · 4 months ago
- 242.I agree AI discourse today feels like covid discourse in Feb/Mar 2020.›Noam Brown (OpenAI Research Scientist) · LLM score 65 · 4 months ago
- 243.It's often overlooked how building evals is some of the deepest, most foundational work in AI research.›Alexander Wei (OpenAI Researcher) · LLM score 40 · 4 months ago
- 244.I'm specifically excited about having tasks that go beyond math and coding.›Kai (OpenAI Researcher) · LLM score 70 · 4 months ago
- 245.As academic benchmarks become saturated, it's increasingly important to have benchmarks that actually reflect real-world capability.›Kai (OpenAI Researcher) · LLM score 85 · 4 months ago
- 246.@swyx That’s a good point.›Noam Brown (OpenAI Research Scientist) · LLM score 70 · 4 months ago
- 247.12/12 problems solved, which would be equivalent to a 1st place performance.›Noam Brown (OpenAI Research Scientist) · LLM score 80 · 4 months ago
- 248.GPT-5-Codex is 10x faster for the easiest queries, and will think 2x longer for the hardest queries that benefit most from more compute.›Noam Brown (OpenAI Research Scientist) · LLM score 10 · 5 months ago
- 249.When we at @OpenAI released o1-preview a year ago, it would think for seconds.›Noam Brown (OpenAI Research Scientist) · LLM score 75 · 5 months ago
- 250.@isthisnessicary @emollick Some evals are harder to beat even with targeted data.›Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
- 251.@Qutossar @emollick Maximizing this benchmark would probably be useful for improving a model's ability to read clocks.›Noam Brown (OpenAI Research Scientist) · LLM score 75 · 5 months ago
- 252.@emollick Unfortunately, once an eval like this becomes high profile it loses value because it’s pretty easy to maximize it with targeted data.›Noam Brown (OpenAI Research Scientist) · LLM score 80 · 5 months ago
- 253.@sriramk Though by "long time" I mean 3+ years.›Noam Brown (OpenAI Research Scientist) · LLM score 20 · 5 months ago
- 254.@sriramk I think these models will quickly improve at verifying their own output, but I agree that AI will be worse than humans in some ways for a long time.›Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
- 255.@erikbryn There are real limitations to what AI models can do today, but I think it’s important to consider the slope of progress.›Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
- 256.@erikbryn From what I've seen, a lot of critics don't have a good understanding of where the frontier really is.›Noam Brown (OpenAI Research Scientist) · LLM score 65 · 5 months ago
- 257.@emollick Also, those forecasts were for *any* AI system to get an IMO gold.›Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
- 258.@conitzer Intro to ML introduces a lot of concepts that are built upon in future ML classes, so I think it makes more sense as the intro class.Noam Brown (OpenAI Research Scientist) · LLM score 40 · 5 months ago
- 259.@rishicomplex I've spoken with profs at a lot of universities about this and almost all agree Intro to AI should cover more ML.›Noam Brown (OpenAI Research Scientist) · LLM score 60 · 5 months ago
- 260.Clarification: I was comparing A @ B + C here, where the cute-dsl version is quite good at overlapping the epilogue.›Tri Dao (Chief Scientist at Together) · LLM score 85 · 5 months ago
- 261.We officially entered the 2025 International Olympiad in Informatics (IOI) online competition track and adhered to the same restrictions as the human contestants, including submissions and time limits, but without direct supervision from the contest organizers.Sheryl Hsu (OpenAI Researcher) · LLM score 70 · 6 months ago
- 262.3/ We’ve come a long way since last summer.›Alexander Wei (OpenAI Researcher) · LLM score 70 · 6 months ago
- 263.5/n It’s been really exciting to see the progress of our newest research methods at OpenAI, with our successes at the AtCoder World Finals, IMO, and IOI over the last couple weeks.›Sheryl Hsu (OpenAI Researcher) · LLM score 20 · 6 months ago
- 264.4/n This result demonstrates a huge improvement over @OpenAI’s attempt at IOI last year where we finished just shy of a bronze medal with a significantly more handcrafted test-time strategy.›Sheryl Hsu (OpenAI Researcher) · LLM score 80 · 6 months ago
- 265.A major cut to the funding of the National Science Foundation would be very bad for the future of the US.›Geoffrey Hinton · LLM score 20 · 6 months ago
- 266.Hierarchical layout is super elegant.›Tri Dao (Chief Scientist at Together) · LLM score 85 · 6 months ago
- 267.Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life.›Jason Wei (AI Researcher at Meta) · LLM score 60 · 7 months ago
- 268.New blog post about asymmetry of verification and "verifier's law": https://t.co/bvS8HrX1jP›Jason Wei (AI Researcher at Meta) · LLM score 80 · 7 months ago
- 269.I played w it for 1h.›Tri Dao (Chief Scientist at Together) · LLM score 35 · 7 months ago
- 270.@RaghuGanti @cHHillee Oh you’d want to use warp reduction if the whole row fits into 1 warp.›Tri Dao (Chief Scientist at Together) · LLM score 80 · 7 months ago