“Trying out Llama.cpp’s New Vision Support,” n.d. https://simonwillison.net/2025/May/10/llama-cpp-vision/#atom-everything
#vllm
“Updated Rate Limits for Unauthenticated Requests,” n.d. https://github.blog/changelog/2025-05-08-updated-rate-limits-for-unauthenticated-requests/
“REVEAL: Multi-Turn Evaluation of Image-Input Harms for Vision LLMs,” n.d. https://arxiv.org/pdf/2505.04673
“Carousels with CSS,” n.d. https://developer.chrome.com/blog/carousels-with-css
“Saying ‘Hi’ to Microsoft’s Phi-4-Reasoning,” n.d. https://simonwillison.net/2025/May/6/phi-4-reasoning/#atom-everything
I still don’t fully understand how this reasoning works. However, from what I see, the LLM asks itself the same question multiple times and considers the result that is the most probable.
Sadly, this again causes it to not come up with novel solutions. Rather, it still operates within the known space / it’s limited by token probablility."
“Feed a Video to a Vision LLM as a Sequence of JPEG Frames on the CLI (Also LLM 0.25),” n.d. https://simonwillison.net/2025/May/5/llm-video-frames/#atom-everything
- The dog’s color is wrong.
- The dog doesn’t sniff the cupcake.
- It does not lick human fingers.
The output is highly hallucinated, not only because frames taken out of context do not represent what is really happening, but also because the AI tends to hallucinate.
“AI Generated Vulnerability Reports Sucks,” n.d. https://hackerone.com/reports/3125832