Learning Library

← Back to Library

Reinforcement-Learning

5 items in this topic

Paper

Efficient Video Reasoning with Dual-Answer Training

  • Introduces a “reason‑when‑necessary” policy that triggers deep reasoning only for ambiguous video frames, reducing unnecessary computation.
  • Proposes a “Thinking Once, Answering Twice” paradigm where the model generates an intermediate reasoning trace before producing two complementary answers, improving answer consistency.
Paper

Decoupled Reward Normalization for Stable Multi‑Reward RL

  • Directly applying GRPO’s group‑wise normalization to a mixture of rewards collapses distinct advantage signals into near‑identical values, hurting learning dynamics.
  • GDPO separates (decouples) the normalization step for each reward component, preserving their relative magnitudes before a final batch‑wise advantage scaling.
Paper

RL‑AWB: Reinforcement Learning for Nighttime White Balance

  • Introduces a hybrid pipeline that first applies a bespoke statistical gray‑pixel detector to estimate illumination in noisy, low‑light scenes.
  • Develops the first deep reinforcement learning (DRL) agent that treats the statistical estimator as its environment, learning to fine‑tune AWB parameters per‑image in a manner akin to a human expert.
Paper

Visual Identity Prompted Multi‑View Video Augmentation for Robotics

  • Introducing “visual identity prompting” supplies diffusion models with explicit object cues, enabling generation of consistent multi‑view videos that preserve object appearance across frames.
  • The generated videos serve as high‑fidelity data augmentations, enriching the visual diversity of manipulation datasets without manual collection.
Paper

Tree‑Search Guided Multi‑Turn Policy Optimization

  • Turn‑level tree search injects diverse, forward‑looking trajectories, dramatically improving exploration in multi‑turn environments.
  • By formulating separate learning objectives for each turn, AT²PO provides clearer credit assignment across long horizons.