Research Papers - Learning Library

Paper PlenopticDreamer: Coherent Multi‑View Video Synthesis

Xiao Fu, • computer-vision multimodal • advanced • ▲ 6 • 2026-01-09

Introduces a camera‑guided retrieval module that pulls relevant latent frames from a pre‑built spatio‑temporal memory, ensuring consistent geometry across different viewpoints.

Paper Pixel‑Perfect Diffusion Transformers for Depth Estimation

Gangwei Xu, Haotong Lin, Hongcheng Luo... • computer-vision robotics • advanced • 2026-01-09

Introduces **Pixel‑Perfect Depth (PPD)**, a monocular depth model that operates directly in pixel space using diffusion transformers, eliminating flying pixels and preserving fine scene details.

Paper Efficient Video Reasoning with Dual-Answer Training

Shuming Liu, • computer-vision multimodal reinforcement-learning • advanced • ▲ 10 • 2026-01-09

Introduces a “reason‑when‑necessary” policy that triggers deep reasoning only for ambiguous video frames, reducing unnecessary computation.

Paper Decoupled Reward Normalization for Stable Multi‑Reward RL

Shih-Yang Liu, • reinforcement-learning efficiency • advanced • ▲ 74 • 2026-01-09

Directly applying GRPO’s group‑wise normalization to a mixture of rewards collapses distinct advantage signals into near‑identical values, hurting learning dynamics.

Paper Learnable Multipliers for Adaptive Scale in LLM Matrix Layers

Maksim Velikanov, • nlp efficiency • advanced • ▲ 27 • 2026-01-09

Attaching a learnable scalar multiplier to each weight matrix lets the model escape the suboptimal weight‑norm equilibrium imposed by fixed weight decay.

Paper 4D Geometric Control for Realistic Video World Modeling

Sixiao Zheng, • computer-vision multimodal • advanced • ▲ 11 • 2026-01-09

Introduces a unified 4D representation (static background point cloud + per‑object 3D Gaussian trajectories) that captures both camera motion and object dynamics in space‑time.

Paper Tight Lower Bounds Separate Online Multicalibration from Marginal Calibration

Natalie Collina, Jiuyao Lu, Georgy Noarov... • ai-ml • advanced • 2026-01-09

The paper proves an Ω(T^{2/3}) information‑theoretic lower bound on expected multicalibration error even when only three disjoint binary groups are used, matching known upper bounds up to log factors.

Paper Central Committee Governs Routing in MoE Models

Yan Wang, • ai-ml • advanced • ▲ 8 • 2026-01-09

Across diverse domains and architectures, a tiny, fixed subset of experts (the “standing committee”) receives the majority of routing votes, contradicting the expected domain‑specific specialization.

Paper One‑Shot Functional Dexterous Grasp Learning via Synthetic Transfer

Xingyi He, Adhitya Polavaram, Yunhao Cao... • robotics computer-vision multimodal • advanced • 2026-01-09

A correspondence‑based data engine turns a single human demonstration into thousands of high‑quality, category‑wide synthetic training examples by morphing object meshes, transferring the expert grasp...

Paper Quantum‑Enhanced Neural Radiance Fields for Compact 3D Synthesis

Daniele Lizzio Bosco, Shuteng Wang, Giuseppe Serra... • computer-vision efficiency ai-ml • advanced • 2026-01-09

QNeRF replaces large MLPs in NeRF with parameterised quantum circuits, exploiting superposition and entanglement to encode spatial and view‑dependent features.

Paper Token‑Level Collaborative Decoding for Efficient LLM Reasoning

Chengsong Huang, • nlp efficiency • advanced • ▲ 16 • 2026-01-09

RelayLLM lets a small language model act as a controller, emitting a special command token to summon the large model only for critical tokens, reducing LLM usage to ~1 % of generated tokens.

Paper Single‑Shot 4D Mesh Reconstruction from Monocular Video

Zeren Jiang, Chuanxia Zheng, Iro Laina... • computer-vision ai-ml • advanced • 2026-01-09

A compact spatio‑temporal latent space encodes an entire animation sequence in one forward pass, enabling “one‑shot” reconstruction of 3D shape and motion.

Paper RL‑AWB: Reinforcement Learning for Nighttime White Balance

Yuan-Kang Lee, • reinforcement-learning computer-vision • advanced • ▲ 19 • 2026-01-09

Introduces a hybrid pipeline that first applies a bespoke statistical gray‑pixel detector to estimate illumination in noisy, low‑light scenes.

Paper Agent-as-a-Judge: Structured LLM Evaluation Framework

Runyang You, • nlp ai-safety • advanced • ▲ 5 • 2026-01-09

Pure LLM judges often mis‑evaluate complex, multi‑step outputs because they lack explicit reasoning and verification mechanisms.

Paper Generalized Referring Expressions for Multi‑Target Vision‑Language Tasks

Henghui Ding, Chang Liu, Shuting He... • multimodal computer-vision nlp • advanced • 2026-01-09

Introduces GREx, a unified benchmark that expands traditional referring expression tasks (RES, REC, REG) to support single‑target, multi‑target, and no‑target expressions, enabling more realistic and ...

Paper Visual Identity Prompted Multi‑View Video Augmentation for Robotics

Boyang Wang, • robotics computer-vision reinforcement-learning • advanced • ▲ 19 • 2026-01-09

Introducing “visual identity prompting” supplies diffusion models with explicit object cues, enabling generation of consistent multi‑view videos that preserve object appearance across frames.

Paper Tree‑Search Guided Multi‑Turn Policy Optimization

Zefang Zong, • reinforcement-learning • advanced • ▲ 15 • 2026-01-09

Turn‑level tree search injects diverse, forward‑looking trajectories, dramatically improving exploration in multi‑turn environments.

Paper Entropy‑Guided Token Attacks on Vision‑Language Models

Mengqi He, • multimodal ai-safety computer-vision • advanced • ▲ 6 • 2026-01-09

Tokens with the highest predictive entropy dominate the semantic output of V‑L models; tampering only with these few tokens yields large degradations.

Paper Topological Reasoning via Holonomic Neural Networks

Ilmo Sung • nlp ai-safety ai-ml • advanced • 2026-01-09

Traditional Transformers and RNNs reside in a “Metric Phase” where causal order can be broken by semantic noise, causing hallucinations.

Paper Hypernetwork‑Driven Private Conditional VAEs for Federated Synthesis

Sunny Gupta, Amit Sethi • ai-ml ai-safety • advanced • 2026-01-04

A shared hypernetwork generates client‑specific VAE decoders and class‑conditional latent priors from lightweight private codes, enabling personalization without exposing raw data.

Paper Mamba: Fast Linear‑Time Sequence Modeling with Input‑Conditioned State Spaces

Albert Gu, Tri Dao • nlp efficiency multimodal • advanced • 2026-01-04

Making SSM parameters input‑dependent gives the model content‑based gating, allowing selective propagation or forgetting of information and closing the performance gap with attention on discrete modal...

Paper Spectral Attention Diagnostics Reveal Valid Mathematical Reasoning

Valentin Noël • nlp ai-safety • advanced • 2026-01-04

Treating attention matrices as token‑level graphs lets spectral analysis separate sound from unsound mathematical proofs.

Paper DiffThinker: Diffusion‑Based Generative Multimodal Reasoning

Zefeng He, • multimodal computer-vision efficiency • advanced • ▲ 22 • 2026-01-03

Reformulates multimodal reasoning as a native image‑to‑image generation task, enabling direct manipulation of visual information instead of indirect text prompts.

Paper Hypergraph‑Based Memory for Enhanced Multi‑Step RAG

Chulun Zhou, • nlp efficiency • advanced • ▲ 73 • 2026-01-03

Conventional RAG memories act as static fact repositories, neglecting the higher‑order relations needed for deep reasoning.

Paper Hierarchical Language Modeling with Dynamic Concept Compression

Xingwei Qu, • nlp efficiency • advanced • ▲ 32 • 2026-01-03

DLCM learns variable‑length “concepts” on the fly, moving computation from dense token streams to a compact latent space where reasoning is cheaper and more focused.