Hover on the video to see corresponding text prompts
A mechanical skeleton girl, standing in place, blowing a kiss toward the viewer.
Mochi+STG
A mechanical skeleton girl, standing in place, blowing a kiss toward the viewer.
Mochi+OSEG
An electric scooter parked against a brick wall.
SDXL+SEG
An electric scooter parked against a brick wall.
SDXL+OSEG
T2V(Comparison)
Hover on the video to see corresponding text prompts
A top-down microscopic view reveals a petri dish teeming with large cells undergoing mitosis, forming the shape of a smiley face
CFG+Mochi
STG+Mochi
A top-down microscopic view reveals a petri dish teeming with large cells undergoing mitosis, forming the shape of a smiley face.
OSEG+Mochi
A top-down microscopic view reveals a petri dish teeming with large cells undergoing mitosis, forming the shape of a smiley face
A low-angle, long exposure shot of a lone female climber, wearing shorts and tank top rock climbing on a massive asteroid in deep space. The climber is suspended against a star-filled void. dramatic shadows across the asteroid's rugged surface, emphasizing the climber's isolation and the scale of the space rock. Dust particles float in the light beams, catching the light. The climber moves methodically, with focused determination.
CFG+Mochi
STG+Mochi
A low-angle, long exposure shot of a lone female climber, wearing shorts and tank top rock climbing on a massive asteroid in deep space. The climber is suspended against a star-filled void. dramatic shadows across the asteroid's rugged surface, emphasizing the climber's isolation and the scale of the space rock. Dust particles float in the light beams, catching the light. The climber moves methodically, with focused determination.
OSEG+Mochi
A low-angle, long exposure shot of a lone female climber, wearing shorts and tank top rock climbing on a massive asteroid in deep space. The climber is suspended against a star-filled void. dramatic shadows across the asteroid's rugged surface, emphasizing the climber's isolation and the scale of the space rock. Dust particles float in the light beams, catching the light. The climber moves methodically, with focused determination.
A wide-angle shot shows a serene monk meditating perched a top of the letter E of a pile of weathered rocks that vertically spell out 'ZE'. The rock formation is perched atop a misty mountain peak at sunrise. The warm light bathes the monk in a gentle glow, highlighting the folds of his saffron robes. The sky behind him is a soft gradient of pink and orange, creating a tranquil backdrop. The camera slowly zooms in, capturing the monk's peaceful expression and the intricate details of the rocks. The scene is bathed in a soft, ethereal light, emphasizing the spiritual atmosphere.
CFG+Mochi
STG+Mochi
A wide-angle shot shows a serene monk meditating perched a top of the letter E of a pile of weathered rocks that vertically spell out 'ZE'. The rock formation is perched atop a misty mountain peak at sunrise. The warm light bathes the monk in a gentle glow, highlighting the folds of his saffron robes. The sky behind him is a soft gradient of pink and orange, creating a tranquil backdrop. The camera slowly zooms in, capturing the monk's peaceful expression and the intricate details of the rocks. The scene is bathed in a soft, ethereal light, emphasizing the spiritual atmosphere.
OSEG+Mochi
A wide-angle shot shows a serene monk meditating perched a top of the letter E of a pile of weathered rocks that vertically spell out 'ZE'. The rock formation is perched atop a misty mountain peak at sunrise. The warm light bathes the monk in a gentle glow, highlighting the folds of his saffron robes. The sky behind him is a soft gradient of pink and orange, creating a tranquil backdrop. The camera slowly zooms in, capturing the monk's peaceful expression and the intricate details of the rocks. The scene is bathed in a soft, ethereal light, emphasizing the spiritual atmosphere.
A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
CFG+Mochi
STG+Mochi
A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
OSEG+Mochi
A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
A high view wide angle shot of two octopuses gliding slowly through a sunlit patch of ocean. They float closer to one another in fluid, spiraling motion. The scene unfolds in a shallow, coral-framed cove where beams of light filter through the water and dance over soft sand. One octopus is a rich, reddish orange, the other a delicate blend of violet and pale blue. As they near, their arms begin to entwine in a gentle, synchronized movement—tactile, intimate, and unhurried. Around them, small tropical fish dart through the water, and as the octopuses embrace, the fish begin to shift, forming a loose, shimmering heart shape around the couple. The camera slowly pushes in as the pulsing bodies continue their dance, their tentacles crossing and curling. In the final moment, two arms intertwine at the center of the frame, forming a perfect heart shape in close-up, framed by the swirling school. The light catches every scale and ripple, holding the ocean in a breathless stillness. Filmed in 16mm film.
CFG+Mochi
STG+Mochi
A high view wide angle shot of two octopuses gliding slowly through a sunlit patch of ocean. They float closer to one another in fluid, spiraling motion. The scene unfolds in a shallow, coral-framed cove where beams of light filter through the water and dance over soft sand. One octopus is a rich, reddish orange, the other a delicate blend of violet and pale blue. As they near, their arms begin to entwine in a gentle, synchronized movement—tactile, intimate, and unhurried. Around them, small tropical fish dart through the water, and as the octopuses embrace, the fish begin to shift, forming a loose, shimmering heart shape around the couple. The camera slowly pushes in as the pulsing bodies continue their dance, their tentacles crossing and curling. In the final moment, two arms intertwine at the center of the frame, forming a perfect heart shape in close-up, framed by the swirling school. The light catches every scale and ripple, holding the ocean in a breathless stillness. Filmed in 16mm film.
OSEG+Mochi
A high view wide angle shot of two octopuses gliding slowly through a sunlit patch of ocean. They float closer to one another in fluid, spiraling motion. The scene unfolds in a shallow, coral-framed cove where beams of light filter through the water and dance over soft sand. One octopus is a rich, reddish orange, the other a delicate blend of violet and pale blue. As they near, their arms begin to entwine in a gentle, synchronized movement—tactile, intimate, and unhurried. Around them, small tropical fish dart through the water, and as the octopuses embrace, the fish begin to shift, forming a loose, shimmering heart shape around the couple. The camera slowly pushes in as the pulsing bodies continue their dance, their tentacles crossing and curling. In the final moment, two arms intertwine at the center of the frame, forming a perfect heart shape in close-up, framed by the swirling school. The light catches every scale and ripple, holding the ocean in a breathless stillness. Filmed in 16mm film.
A front-facing wide angle shot of two inflatable duck floaties. They drift toward each other until their beaks meet in a gentle kiss and then magically one of them closes his eyes. The moment takes place in a quiet backyard pool under warm afternoon sun, with calm water and tiled edges reflecting soft light. One floatie is slightly faded, with creases from long use; the other is bright yellow, glossy and new. As they float closer, their heads tilt subtly, and the space between their beaks, rounded heads, and arched necks forms an evident heart shape from this angle. Just as they touch, surprisingly each floatie magically winks—one eye narrowing in a playful, synchronized gesture. The water stays nearly still, disturbed only by their closeness. Soft highlights ripple across their surfaces, and the pool glows in a warm, summery tone. Filmed in 16mm film.
CFG+Mochi
STG+Mochi
A front-facing wide angle shot of two inflatable duck floaties. They drift toward each other until their beaks meet in a gentle kiss and then magically one of them closes his eyes. The moment takes place in a quiet backyard pool under warm afternoon sun, with calm water and tiled edges reflecting soft light. One floatie is slightly faded, with creases from long use; the other is bright yellow, glossy and new. As they float closer, their heads tilt subtly, and the space between their beaks, rounded heads, and arched necks forms an evident heart shape from this angle. Just as they touch, surprisingly each floatie magically winks—one eye narrowing in a playful, synchronized gesture. The water stays nearly still, disturbed only by their closeness. Soft highlights ripple across their surfaces, and the pool glows in a warm, summery tone. Filmed in 16mm film.
OSEG+Mochi
A front-facing wide angle shot of two inflatable duck floaties. They drift toward each other until their beaks meet in a gentle kiss and then magically one of them closes his eyes. The moment takes place in a quiet backyard pool under warm afternoon sun, with calm water and tiled edges reflecting soft light. One floatie is slightly faded, with creases from long use; the other is bright yellow, glossy and new. As they float closer, their heads tilt subtly, and the space between their beaks, rounded heads, and arched necks forms an evident heart shape from this angle. Just as they touch, surprisingly each floatie magically winks—one eye narrowing in a playful, synchronized gesture. The water stays nearly still, disturbed only by their closeness. Soft highlights ripple across their surfaces, and the pool glows in a warm, summery tone. Filmed in 16mm film.
T2I comparison
Compared to the existing SOTA guidance methods, incorporating OSEG alongside CFG noticeably improves spatial resolution with natural semantic coherence of the structures within the samples. This combination effectively strengthens fine-grained details and overall harmony, leading to high-resolution generations such as corrected details of the basketball player and the hoop, the cinematic detail of the man’s face with shadow, the appropriate scale of burger, high-detailed horse, the appropriate structure of mandolin, and the robot with precise and meaningful spatial structure, which demonstrate OSEG’s ability to capture nuanced atmospheric depth and stylistic fidelity, delivering outputs with greater realism and artistic coherence than the compared ones.
A basketball player jumping, crowd cheering in the background, with the ball toward the hoop.
SDXL+SAG
A basketball player jumping, crowd cheering in the background, with the ball toward the hoop.
SDXL+SEG
A basketball player jumping, crowd cheering in the background, with the ball toward the hoop.
SDXL+OSEG
A handsome man.
SDXL+SAG
A handsome man.
SDXL+SEG
A handsome man.
SDXL+OSEG
A close up photo of a burger, close-shot, macro-quality.
SDXL+SAG
A close up photo of a burger, close-shot, macro-quality.
SDXL+SEG
A close up photo of a burger, close-shot, macro-quality.
SDXL+OSEG
Two dogs, one cat.
SDXL+SAG
Two dogs, one cat.
SDXL+SEG
Two dogs, one cat.
SDXL+OSEG
An old man playing mandolin with his dog.
SDXL+SAG
An old man playing mandolin with his dog.
SDXL+SEG
An old man playing mandolin with his dog.
SDXL+OSEG
A robot is painting potrait.
SDXL+SAG
A robot is painting potrait..
SDXL+SEG
A robot is painting potrait..
SDXL+OSEG
Ablation Study
1. Impact of different values of (\sigma\) for OSEG; as (\sigma\) increases, negligible changes (red boxes) happen within the overall generation.
A beautiful motorbike, front view, high quality.
\( \sigma = 3 \)
A beautiful motorbike, front view, high quality.
\( \sigma = 5 \)
A beautiful motorbike, front view, high quality.
$ \sigma = \infty $
2. Unconditional generation.
SDXL+PAG
SDXL+SEG
SDXL+OSEG
SDXL+PAG
SDXL+SEG
SDXL+OSEG
Contact Us
Feel free to contact Nazmus Saqib at nsaqib1995@gmail.com for any question,cooperation, and communication.
If you find this work useful, please consider citing:
@article{zhang2025trainingfreeefficientvideogeneration,
title={Training-Free Efficient Video Generation via Dynamic Token Carving},
author={Yuechen Zhang and Jinbo Xing and Bin Xia and Shaoteng Liu and Bohao Peng and Xin Tao and Pengfei Wan and Eric Lo and Jiaya Jia},
journal={arXiv preprint arXiv:2505.16864},
year={2025}
}
Thank UltraPixel, ControlNeXt, and ToonCrafter to provide us the project page's template!