約 29,100 件の結果
リンクを新しいタブで開く
  1. CLEVER: A Curated Benchmark for Formally Verified Code …

    2025年7月8日 · TL;DR: We introduce CLEVER, a hand-curated benchmark for verified code generation in Lean. It requires full formal specs and proofs. No few-shot method solves all …

  2. Clever: A Curated Benchmark for Formally Verified Code Generation

    We introduce CLEVER, the first curated benchmark for evaluating the generation of specifications and formally verified code in Lean. The benchmark comprises of 161 programming problems; it …

  3. Submissions | OpenReview

    2025年1月22日 · Promoting openness in scientific communication and the peer-review process

  4. STAIR: Improving Safety Alignment with Introspective Reasoning

    2025年5月1日 · One common approach is training models to refuse unsafe queries, but this strategy can be vulnerable to clever prompts, often referred to as jailbreak attacks, which can …

  5. Counterfactual Debiasing for Fact Verification

    579 In this paper, we have proposed a novel counter- factual framework CLEVER for debiasing fact- checking models. Unlike existing works, CLEVER is augmentation-free and mitigates …

  6. Evaluating the Robustness of Neural Networks: An Extreme Value...

    2018年2月15日 · Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack …

  7. On the Planning Abilities of Large Language Models : A Critical ...

    While, as we mentioned earlier, there can be thorny “clever hans” issues about humans prompting LLMs, an automated verifier mechanically backprompting the LLM doesn’t suffer from these. …

  8. Alias-Free Mamba Neural Operator | OpenReview

    2024年9月25日 · Functionally, MambaNO achieves a clever balance between global integration, facilitated by state space model of Mamba that scans the entire function, and local integration, …

  9. EvoTest: Evolutionary Test-Time Learning for Self-Improving …

    2025年9月16日 · A fundamental limitation of current AI agents is their inability to learn complex skills on the fly at test time, often behaving like “clever but clueless interns” in novel …

  10. A Protocol-Driven Platform for Agent-Agnostic Evaluation of LLM …

    2025年9月23日 · Hook it up with TaskConfig—our handy layer for crafting clever input templates and grabbing outputs steadily via JMESPath—and switching agents turns effortless, no extra …