[go: up one dir, main page]

https://arxiv.org/abs/2512.23851
Repo: https://github.com/lllyasviel/PFP

\n","updatedAt":"2026-01-01T03:01:38.339Z","author":{"_id":"639c1572445b133a4e9b3a3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1671173450971-noauth.jpeg","fullname":"Lvmin Zhang","name":"lllyasviel","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9909,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3168230950832367},"editors":["lllyasviel"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1671173450971-noauth.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"695f22949b71224e90d9a781","author":{"_id":"6449eca686e837e3d5879384","avatarUrl":"/avatars/d14417002e1582e8f2fca88159e25314.svg","fullname":"Dealay Lomoi","name":"DealayLomoi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-01-08T03:20:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"It seems that the Github link is no longer available.","html":"

It seems that the Github link is no longer available.

\n","updatedAt":"2026-01-08T03:20:52.811Z","author":{"_id":"6449eca686e837e3d5879384","avatarUrl":"/avatars/d14417002e1582e8f2fca88159e25314.svg","fullname":"Dealay Lomoi","name":"DealayLomoi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9839420914649963},"editors":["DealayLomoi"],"editorAvatarUrls":["/avatars/d14417002e1582e8f2fca88159e25314.svg"],"reactions":[],"isReport":false,"parentCommentId":"6955e39227b0975343950149"}}]},{"id":"69572105137782117c401c35","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":305,"isUserFollowing":false},"createdAt":"2026-01-02T01:36:05.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory](https://huggingface.co/papers/2512.04519) (2025)\n* [TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction](https://huggingface.co/papers/2511.12578) (2025)\n* [Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation](https://huggingface.co/papers/2512.18741) (2025)\n* [Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression](https://huggingface.co/papers/2512.05081) (2025)\n* [Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context](https://huggingface.co/papers/2512.11293) (2025)\n* [StoryMem: Multi-shot Long Video Storytelling with Memory](https://huggingface.co/papers/2512.19539) (2025)\n* [FilmWeaver: Weaving Consistent Multi-Shot Videos with Cache-Guided Autoregressive Diffusion](https://huggingface.co/papers/2512.11274) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-02T01:36:05.992Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":305,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6901129484176636},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"69574410df172c4fd9497905","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-01-02T04:05:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/pretraining-frame-preservation-in-autoregressive-video-memory-compression-6536-d0522150\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/pretraining-frame-preservation-in-autoregressive-video-memory-compression-6536-d0522150

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-01-02T04:05:36.090Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6887804269790649},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2512.23851","authors":[{"_id":"6955e31a832867f2535255ba","name":"Lvmin Zhang","hidden":false},{"_id":"6955e31a832867f2535255bb","name":"Shengqu Cai","hidden":false},{"_id":"6955e31a832867f2535255bc","name":"Muyang Li","hidden":false},{"_id":"6955e31a832867f2535255bd","name":"Chong Zeng","hidden":false},{"_id":"6955e31a832867f2535255be","name":"Beijia Lu","hidden":false},{"_id":"6955e31a832867f2535255bf","user":{"_id":"63f8130749569335b679af62","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f8130749569335b679af62/vgTu23-y0UKocwAGqNMwT.jpeg","isPro":false,"fullname":"Anyi Rao","user":"anyirao","type":"user"},"name":"Anyi Rao","status":"claimed_verified","statusLastChangedAt":"2026-01-02T15:39:54.196Z","hidden":false},{"_id":"6955e31a832867f2535255c0","name":"Song Han","hidden":false},{"_id":"6955e31a832867f2535255c1","name":"Gordon Wetzstein","hidden":false},{"_id":"6955e31a832867f2535255c2","name":"Maneesh Agrawala","hidden":false}],"publishedAt":"2025-12-29T20:29:21.000Z","submittedOnDailyAt":"2026-01-01T00:31:38.328Z","title":"Pretraining Frame Preservation in Autoregressive Video Memory Compression","submittedOnDailyBy":{"_id":"639c1572445b133a4e9b3a3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1671173450971-noauth.jpeg","isPro":true,"fullname":"Lvmin Zhang","user":"lllyasviel","type":"user"},"summary":"We present PFP, a neural network structure to compress long videos into short contexts, with an explicit pretraining objective to preserve the high-frequency details of single frames at arbitrary temporal positions. The baseline model can compress a 20-second video into a context at about 5k length, where random frames can be retrieved with perceptually preserved appearances. Such pretrained models can be directly fine-tuned as memory encoders for autoregressive video models, enabling long history memory with low context cost and relatively low fidelity loss. We evaluate the framework with ablative settings and discuss the trade-offs of possible neural architecture designs.","upvotes":22,"discussionId":"6955e31a832867f2535255c3"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63aec0c3fcca84593e648a20","avatarUrl":"/avatars/ec2d97cc29e6c46fc78135bf031aec81.svg","isPro":false,"fullname":"seruva19","user":"seruva19","type":"user"},{"_id":"6486df66373f79a52913e017","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6486df66373f79a52913e017/vUncXohxJN4ixXR6QUMxh.jpeg","isPro":true,"fullname":"Xiangpeng Yang","user":"XiangpengYang","type":"user"},{"_id":"64d98ef7a4839890b25eb78b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d98ef7a4839890b25eb78b/215-CSVLl81z6CAq0ECWU.jpeg","isPro":true,"fullname":"Fangyuan Yu","user":"Ksgk-fy","type":"user"},{"_id":"6899c1c4c8d32e4cdea87215","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/8plwumYYJa1TNkZvF1xoj.jpeg","isPro":false,"fullname":"Bowen Xue","user":"BowenXue","type":"user"},{"_id":"63ca8e060609f1def7e6548a","avatarUrl":"/avatars/1da7947840cb87d5f77c0af9ee11f9c2.svg","isPro":true,"fullname":"Yi Jung","user":"YJ-142150","type":"user"},{"_id":"6351463b8445bbe32e944f6c","avatarUrl":"/avatars/ec0e8f378d5314d4af97d6c488771b3d.svg","isPro":false,"fullname":"Yuhao Liu","user":"LeoLau","type":"user"},{"_id":"652b83b73b5997ed71a310f2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/652b83b73b5997ed71a310f2/ipCpdeHUp4-0OmRz5z8IW.png","isPro":false,"fullname":"Rui Zhao","user":"ruizhaocv","type":"user"},{"_id":"67136093d2e50f1e8c9fad52","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0q49MyGuav8lJ9CIeyLhu.png","isPro":false,"fullname":"Donghao Zhou","user":"donghao-zhou","type":"user"},{"_id":"656084f44e8918182d4f07c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/akAvCUCi7eR31PWOXrVPw.jpeg","isPro":false,"fullname":"Yihao Meng","user":"Yhmeng1106","type":"user"},{"_id":"60747cbf3ea03830676542b5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60747cbf3ea03830676542b5/wGr1Jzz520JM9nZ-UcLyb.png","isPro":false,"fullname":"Chong Zeng","user":"NCJ","type":"user"},{"_id":"63f8130749569335b679af62","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f8130749569335b679af62/vgTu23-y0UKocwAGqNMwT.jpeg","isPro":false,"fullname":"Anyi Rao","user":"anyirao","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2512.23851

Pretraining Frame Preservation in Autoregressive Video Memory Compression

Published on Dec 29, 2025
· Submitted by
Lvmin Zhang
on Jan 1
Authors:
,
,
,
,
,
,
,

Abstract

We present PFP, a neural network structure to compress long videos into short contexts, with an explicit pretraining objective to preserve the high-frequency details of single frames at arbitrary temporal positions. The baseline model can compress a 20-second video into a context at about 5k length, where random frames can be retrieved with perceptually preserved appearances. Such pretrained models can be directly fine-tuned as memory encoders for autoregressive video models, enabling long history memory with low context cost and relatively low fidelity loss. We evaluate the framework with ablative settings and discuss the trade-offs of possible neural architecture designs.

Community

Paper submitter

It seems that the Github link is no longer available.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/pretraining-frame-preservation-in-autoregressive-video-memory-compression-6536-d0522150

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.23851 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.23851 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.23851 in a Space README.md to link it from this page.

Collections including this paper 2