HN - AI300

AI300

241.
Julian was co-first author on AlphaGo, AlphaZero, and MuZero.›
Noam Brown (OpenAI Research Scientist) · LLM score 40 · 4 months ago
242.
I agree AI discourse today feels like covid discourse in Feb/Mar 2020.›
Noam Brown (OpenAI Research Scientist) · LLM score 65 · 4 months ago
243.
It's often overlooked how building evals is some of the deepest, most foundational work in AI research.›
Alexander Wei (OpenAI Researcher) · LLM score 40 · 4 months ago
244.
I'm specifically excited about having tasks that go beyond math and coding.›
Kai (OpenAI Researcher) · LLM score 70 · 4 months ago
245.
As academic benchmarks become saturated, it's increasingly important to have benchmarks that actually reflect real-world capability.›
Kai (OpenAI Researcher) · LLM score 85 · 4 months ago
246.
@swyx That’s a good point.›
Noam Brown (OpenAI Research Scientist) · LLM score 70 · 4 months ago
247.
12/12 problems solved, which would be equivalent to a 1st place performance.›
Noam Brown (OpenAI Research Scientist) · LLM score 80 · 4 months ago
248.
GPT-5-Codex is 10x faster for the easiest queries, and will think 2x longer for the hardest queries that benefit most from more compute.›
Noam Brown (OpenAI Research Scientist) · LLM score 10 · 5 months ago
249.
When we at @OpenAI released o1-preview a year ago, it would think for seconds.›
Noam Brown (OpenAI Research Scientist) · LLM score 75 · 5 months ago
250.
@isthisnessicary @emollick Some evals are harder to beat even with targeted data.›
Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
251.
@Qutossar @emollick Maximizing this benchmark would probably be useful for improving a model's ability to read clocks.›
Noam Brown (OpenAI Research Scientist) · LLM score 75 · 5 months ago
252.
@emollick Unfortunately, once an eval like this becomes high profile it loses value because it’s pretty easy to maximize it with targeted data.›
Noam Brown (OpenAI Research Scientist) · LLM score 80 · 5 months ago
253.
@sriramk Though by "long time" I mean 3+ years.›
Noam Brown (OpenAI Research Scientist) · LLM score 20 · 5 months ago
254.
@sriramk I think these models will quickly improve at verifying their own output, but I agree that AI will be worse than humans in some ways for a long time.›
Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
255.
@erikbryn There are real limitations to what AI models can do today, but I think it’s important to consider the slope of progress.›
Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
256.
@erikbryn From what I've seen, a lot of critics don't have a good understanding of where the frontier really is.›
Noam Brown (OpenAI Research Scientist) · LLM score 65 · 5 months ago
257.
@emollick Also, those forecasts were for *any* AI system to get an IMO gold.›
Noam Brown (OpenAI Research Scientist) · LLM score 70 · 5 months ago
258.
@conitzer Intro to ML introduces a lot of concepts that are built upon in future ML classes, so I think it makes more sense as the intro class.
Noam Brown (OpenAI Research Scientist) · LLM score 40 · 5 months ago
259.
@rishicomplex I've spoken with profs at a lot of universities about this and almost all agree Intro to AI should cover more ML.›
Noam Brown (OpenAI Research Scientist) · LLM score 60 · 5 months ago
260.
Clarification: I was comparing A @ B + C here, where the cute-dsl version is quite good at overlapping the epilogue.›
Tri Dao (Chief Scientist at Together) · LLM score 85 · 5 months ago
261.
We officially entered the 2025 International Olympiad in Informatics (IOI) online competition track and adhered to the same restrictions as the human contestants, including submissions and time limits, but without direct supervision from the contest organizers.
Sheryl Hsu (OpenAI Researcher) · LLM score 70 · 6 months ago
262.
3/ We’ve come a long way since last summer.›
Alexander Wei (OpenAI Researcher) · LLM score 70 · 6 months ago
263.
5/n It’s been really exciting to see the progress of our newest research methods at OpenAI, with our successes at the AtCoder World Finals, IMO, and IOI over the last couple weeks.›
Sheryl Hsu (OpenAI Researcher) · LLM score 20 · 6 months ago
264.
4/n This result demonstrates a huge improvement over @OpenAI’s attempt at IOI last year where we finished just shy of a bronze medal with a significantly more handcrafted test-time strategy.›
Sheryl Hsu (OpenAI Researcher) · LLM score 80 · 6 months ago
265.
A major cut to the funding of the National Science Foundation would be very bad for the future of the US.›
Geoffrey Hinton · LLM score 20 · 6 months ago
266.
Hierarchical layout is super elegant.›
Tri Dao (Chief Scientist at Together) · LLM score 85 · 6 months ago
267.
Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life.›
Jason Wei (AI Researcher at Meta) · LLM score 60 · 7 months ago
268.
New blog post about asymmetry of verification and "verifier's law": https://t.co/bvS8HrX1jP›
Jason Wei (AI Researcher at Meta) · LLM score 80 · 7 months ago
269.
I played w it for 1h.›
Tri Dao (Chief Scientist at Together) · LLM score 35 · 7 months ago
270.
@RaghuGanti @cHHillee Oh you’d want to use warp reduction if the whole row fits into 1 warp.›
Tri Dao (Chief Scientist at Together) · LLM score 80 · 7 months ago