• 0 Posts
  • 128 Comments
Joined 4 months ago
cake
Cake day: February 15th, 2026

help-circle



  • It is like the machine in industrial revolution stayed and flourished, yet the child labor and unfair labor practice are being fight against.

    Clearly, this is not a justification for current LLM system or those feeding it. I feel it is important to keep these companies accountable for the crime they have commited.

    In the long run, I feel it is also important to think about what do we need to do to keep LLM working for humanity, and organize to make that a reality. If that requires complete removal of LLM as a technology, so be it; but I am not entirely convinced we even need to go that far.




  • It seems like you really looking forward for a ground invasion to remove the regime. I can totally understand that.

    I feel the reluctant on the U.S. side is that U.S. have fought a number of these drawn out invasions with large spending and human casualties. In the end, they left these country in chaos and stagnation, arguably not much better than before.

    Do you feel if there is a ground invasion of iran, will it be a different story?

    Plus, how do you feel about U.S. bombing iranian universities; do you feel it is necessary?

    (sorry it feels a bit like I am forcing your answers towards a direction. I am genuinely curious in what you think.


  • That is the thing about formal proof: if the definition is correct, which usually is relatively short and should be written by human, there is almost no chance of the prove being wrong. The only exception are when the LLM exploits a bug in the proof assistant kernel, and these kernel are usually designed to be exceptionally small, thus making bugs unlikely.

    That being said, opus 4.6 found a bug that eventually lead to the proof of false (opus is unable to produce the proof of false, hence unlikely to exploit it): https://github.com/rocq-prover/rocq/issues/21682

    However, like I said, the code quality of the llm is usually not on par with an expert, and they have a tendency to produce unnecessary lemmas and complications that will need to be cleaned up by human.

    Also, we have a very detailed pen and paper proof, which are designed to be easily translatable to proof assistants. We have also setup all the lemma and theorems to reach the end goal. All of these are done by humans, without these, I don’t believe any LLM can make much progress on this project.




  • There are proof assistants https://en.wikipedia.org/wiki/Proof_assistant that would encode a mathematical proof as code, and verify its correctness for you.

    Writing completely formal proof is very painstaking, because it means we will need to flash out a lot and a lot of details (which are mostly trivial for experts) for computers to accept it, and we also need to know how to work with proof assistants.

    Human proofs often ignore these details to make it readable, yet also make it more prone to mistakes. Whereas formalized prove in proof assistant can very rarely be wrong (unless there is an unlikely bug in the assistant kernel), but mostly unreadable (unless the proof is incredibly elegant).

    So in general, translating good human proof to computer proof requires more expert labor than huge conceptual innovation, yet it usually require the steep learning curve of understanding the ins and outs of a proof assistant, which can take years of experience.

    LLM used to be pretty bad at this because even filling in trivial details can quickly derail them. Recently a few flagship coding model are finally able to do this, albeit with a large amount of token consumption in thinking.


  • Many academics around me have a paid plan of LLM of some sort, most are on $100 plan some are on $200, all of them are getting reimbursed for their plan.

    Most of them uses it to optimize code, generate visualization, or formalize pen and paper proof.

    I hated it, and don’t use much of it myself. But it seems too useful for these people and it is hard to stop them. As an example, formalizing a pen and paper proof can take an expert weeks, if not month of work, whereas it only takes codex a week.

    But I do feel this success is tied to the nature and value of academia, and might not transfer to other fields or industrial projects:

    • we usually have tiny codebases: it is not uncommon to have a 10-line algorithm with a 70-page paper explaining its correctness
    • 90%, if not more, of the codes are proof of concepts, without the expectation for long term maintainance.
    • the work is highly specialized, everyone is running out of time, and there is high expectation of the outcome of the work: in our recent work, we do have an expert in formalization, but he doesn’t have enough time, so the grad student formalized the project using codex. The overall architecture is probably much much worse than what would have been done by the expert. One interesting outcome is that codex is able to prove a more general result than the expert intended: not because it found a better proof, but because it is much better at bruteforcing a solution than human.