[go: up one dir, main page]

Simon Willison’s Weblog

Subscribe
Atom feed for research Random

9 posts tagged “research”

2026

Unicode Explorer using binary search over fetch() HTTP range requests. Here's a little prototype I built this morning from my phone as an experiment in HTTP range requests, and a general example of using LLMs to satisfy curiosity.

I've been collecting HTTP range tricks for a while now, and I decided it would be fun to build something with them myself that used binary search against a large file to do something useful.

So I brainstormed with Claude. The challenge was coming up with a use case for binary search where the data could be naturally sorted in a way that would benefit from binary search.

One of Claude's suggestions was looking up information about unicode codepoints, which means searching through many MBs of metadata.

I had Claude write me a spec to feed to Claude Code - visible here - then kicked off an asynchronous research project with Claude Code for web against my simonw/research repo to turn that into working code.

Here's the resulting report and code. One interesting thing I learned is that Range request tricks aren't compatible with HTTP compression because they mess with the byte offset calculations. I added 'Accept-Encoding': 'identity' to the fetch() calls but this isn't actually necessary because Cloudflare and other CDNs automatically skip compression if a content-range header is present.

I deployed the result to my tools.simonwillison.net site, after first tweaking it to query the data via range requests against a CORS-enabled 76.6MB file in an S3 bucket fronted by Cloudflare.

The demo is fun to play with - type in a single character like ø or a hexadecimal codepoint indicator like 1F99C and it will binary search its way through the large file and show you the steps it takes along the way:

Animated demo of a web tool called Unicode Explore. I enter the ampersand character and hit Search. A box below shows a sequence of HTTP binary search requests made, finding in 17 steps with 3,864 bytes transferred and telling me that ampersand is U+0026 in Punctuation other, Basic Latin

# 27th February 2026, 5:50 pm / algorithms, http, research, tools, unicode, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, http-range-requests

But the intellectually interesting part for me is something else. I now have something close to a magic box where I throw in a question and a first answer comes back basically for free, in terms of human effort. Before this, the way I'd explore a new idea is to either clumsily put something together myself or ask a student to run something short for signal, and if it's there, we’d go deeper. That quick signal step, i.e., finding out if a question has any meat to it, is what I can now do without taking up anyone else's time. It’s now between just me, Claude Code, and a few days of GPU time.

I don’t know what this means for how we do research long term. I don’t think anyone does yet. But the distance between a question and a first answer just got very small.

Dimitris Papailiopoulos, on running research questions though Claude Code

# 17th February 2026, 2:04 pm / research, ai, generative-ai, llms, coding-agents, claude-code

2025

I take tap dance evening classes at the College of San Mateo community college. A neat bonus of this is that I'm now officially a student of that college, which gives me access to their library... including the ability to send text messages to the librarians asking for help with research.

I recently wrote about Coutellerie Nontronnaise on my Niche Museums website, a historic knife manufactory in Nontron, France. They had a certificate on the wall claiming that they had previously held a Guinness World Record for the smallest folding knife, but I had been unable to track down any supporting evidence.

I posed this as a text message challenge to the librarians, and they tracked down the exact page from the 1989 "Le livre guinness des records" describing the record:

Le plus petit

Les établissements Nontronnaise ont réalisé un couteau de 10 mm de long, pour le Festival d’Aubigny, Vendée, qui s’est déroulé du 4 au 5 juillet 1987.

Thank you, Maria at the CSM library!

# 4th December 2025, 11:52 pm / libraries, museums, research

The SIFT method (via) The SIFT method is "an evaluation strategy developed by digital literacy expert, Mike Caulfield, to help determine whether online content can be trusted for credible or reliable sources of information."

This looks extremely useful as a framework for helping people more effectively consume information online (increasingly gathered with the help of LLMs).

  • Stop. "Be aware of your emotional response to the headline or information in the article" to protect against clickbait, and don't read further or share until you've applied the other three steps.
  • Investigate the Source. Apply lateral reading, checking what others say about the source rather than just trusting their "about" page.
  • Find Better Coverage. "Use lateral reading to see if you can find other sources corroborating the same information or disputing it" and consult trusted fact checkers if necessary.
  • Trace Claims, Quotes, and Media to their Original Context. Try to find the original report or referenced material to learn more and check it isn't being represented out of context.

This framework really resonates with me: it formally captures and improves on a bunch of informal techniques I've tried to apply in my own work.

# 7th September 2025, 8:51 pm / blogging, research, ai-assisted-search, digital-literacy

The Wikimedia Research Newsletter (via) Speaking of summarizing research papers, I just learned about this newsletter and it is an absolute gold mine:

The Wikimedia Research Newsletter (WRN) covers research of relevance to the Wikimedia community. It has been appearing generally monthly since 2011, and features both academic research publications and internal research done at the Wikimedia Foundation.

The March 2025 issue had a fascinating section titled So again, what has the impact of ChatGPT really been? pulled together by WRN co-founder Tilman Bayer. It covers ten different papers, here's one note that stood out to me:

[...] the authors observe an increasing frequency of the words “crucial” and “additionally”, which are favored by ChatGPT [according to previous research] in the content of Wikipedia article.

# 13th June 2025, 8:24 pm / research, wikipedia, chatgpt, paper-review

2016

Help with next steps for a startup

Have you thought about applying to Y Combinator? The reason I ask is that “I have lots of expertise in language learning and basically zero expertise in startups, market research, business, fundraising, app pricing, etc” is pretty much YC’s sweet spot: they know that it’s much easier teaching those things to engineers and makers than it is to teach engineering to business people (I’m assuming you have product and engineering skills based on your description of your progress so far).

[... 175 words]

2009

Building Rome in a Day (via) “The first system capable of city-scale reconstruction from unstructured photo collections”—computer vision techniques used to construct 3D models of cities using 10s of thousands of photos from Flickr. Reminiscent of Microsoft PhotoSynth.

# 29th July 2009, 3:41 pm / 3d, computer-vision, flickr, photos, photosynth, research, rome

2008

Yahoo! Releases OpenID Research. Extremely valuable research, conducted with a group of typical Yahoo! users. OpenIDs usability remains bad, and if we don’t get it right soon something centralised like Facebook Connect will take over and the Web will stop being open.

# 14th October 2008, 4:59 pm / facebook, facebookconnect, openid, research, usability, yahoo

2006

Can social bookmarking services prevent a bookmark from becoming dead links?

Yahoo!’s MyWeb 2.0 can do that. (Disclaimer: I work for Yahoo!, but not directly on that product).

[... 36 words]