TL;DR

Alibaba confirmed today that it built Happy Horse 1.0 — the anonymous AI video model that’s been sitting at #1 on Artificial Analysis’s Video Arena for days with nobody knowing who made it. The 15-billion-parameter model generates 1080p video and synchronized audio in a single forward pass, beating ByteDance’s Seedance 2.0 by 74 Elo points in text-to-video. It’s fully open-source, and the guy who built it is Zhang Di, the same engineer who created Kling at Kuaishou.

The Unmasking

For the past week, a model called “HappyHorse-1.0” has been dominating the Artificial Analysis Video Arena, the most widely cited benchmark for AI video generation. It climbed to #1 in text-to-video, #1 in image-to-video, and #2 in both audio-inclusive categories. Nobody knew who built it.

Today, Alibaba’s Taotian Group (their e-commerce division) confirmed through a Bloomberg report that Happy Horse came out of their Future Life Lab. The model was submitted to the arena anonymously, racked up wins in blind human preference tests, and only after it locked the top spot did Alibaba step forward.

This follows a pattern we’ve seen before in Chinese AI: submit anonymously, let the benchmarks speak, then claim credit. DeepSeek did something similar with their early models. But Happy Horse’s margin of victory is unusual. 74 Elo points over the next best model in text-to-video is the largest gap the leaderboard has ever recorded.

The Numbers

1,347
T2V Elo (#1)
1,406
I2V Elo (#1)
15B
Parameters
38s
Per clip on H100

Here’s how the current Artificial Analysis Video Arena looks:

Category#1Elo#2EloGap
Text-to-Video (no audio)Happy Horse 1.0~1,347Seedance 2.0~1,273+74
Image-to-Video (no audio)Happy Horse 1.0~1,406Seedance 2.0~1,355+51
Text-to-Video (with audio)Seedance 2.0~1,220Happy Horse 1.0~1,215-5
Image-to-Video (with audio)Seedance 2.0~1,161Happy Horse 1.0~1,155-6

A 74-point Elo gap in text-to-video means Happy Horse wins roughly 60% of blind matchups against the #2 model. For context, OpenAI’s Sora 2 Pro sits at #4 on the same leaderboard. And Sora is shutting down on April 26.

The one area where Happy Horse doesn’t lead is audio-inclusive generation, where Seedance 2.0 edges it out by a few points. The scores are nearly identical, but keep it in mind if your use case depends on sound quality.

The Man Who Built Kling, Then Beat It

Happy Horse was built by Zhang Di, who previously served as VP of Technology at Kuaishou, China’s second-largest short video platform. At Kuaishou, Zhang Di was the architect behind Kling 1.0 and Kling 2.0, which were the dominant AI video models in China through most of 2025.

He left Kuaishou for Alibaba’s Taotian Group, where he now runs the Future Life Lab. And his first major output there just beat the models he built at his old job. Kling 3.0 currently sits outside the top 3 on the Artificial Analysis leaderboard.

There’s a recurring dynamic in Chinese AI where star engineers move between companies and their institutional knowledge reshapes entire product lines. Zhang Di’s move from Kuaishou to Alibaba now shows up directly in the leaderboard results.

How It Works: One Model, One Pass

Most AI video generators follow a multi-stage pipeline: generate silent video, run a separate audio model, then synchronize the lip movements. Happy Horse skips all of that.

The model is a 15-billion-parameter unified self-attention Transformer that processes text, image, video, and audio tokens in a single sequence. No cross-attention modules, no separate audio branch, no conditioning network bolted on the side. Everything runs through one architecture.

Text prompt → Tokenizer → Unified Transformer (15B) → Video + Audio output
                    Single token sequence:
                    [text] [image] [video] [audio]

Two technical details that matter for performance:

  1. DMD-2 distillation: The model generates a 1080p clip in just 8 denoising steps without classifier-free guidance. On an H100, that translates to about 38 seconds per clip. Undistilled diffusion models typically need 50-100 steps.

  2. Multilingual lip-sync: Happy Horse natively supports six languages (Chinese, English, Japanese, Korean, German, and French) with synchronized lip movements. Most competitors handle this through post-processing.

The single-pass approach is also what gives it the quality edge. When audio and video are generated together, temporal alignment comes for free. In multi-stage pipelines, even small synchronization errors accumulate into uncanny-valley territory.

What This Means for AI Video

Three observations about where the market stands now:

Chinese labs own the top of the leaderboard. The top 4 models on Artificial Analysis Video Arena (Happy Horse, Seedance 2.0, SkyReels V4, and Kling 3.0) are all Chinese. Google’s Veo 3 and OpenAI’s Sora 2 Pro are fighting for spots lower down. A year ago, Sora’s announcement dominated AI video headlines. Now it’s shutting down, and the top 4 video models are all Chinese open-source projects.

Open-source is winning in video too. Happy Horse will be fully open-sourced: weights, distilled models, super-resolution modules, and inference code. This follows the pattern set by Stability AI in image generation and now being replicated in video. When the weights drop, anyone with an H100 (or a few A100s with 48GB+ VRAM) can run it locally. That matters for startups building video products who don’t want to depend on API pricing from closed-source providers.

And then there’s the talent angle. Alibaba hired Zhang Di, gave him a small lab, and let him rebuild what he already knew how to build. The model came from a “Future Life Lab” inside Alibaba’s e-commerce arm, far from the company’s official AI research division.

When Can You Use It?

Here’s the timeline Alibaba has given:

  • The model is already live on Artificial Analysis Video Arena for blind testing
  • Weights are “coming soon” (no exact date, but a GitHub repo exists with a placeholder)
  • API launch is reportedly planned for April 30
  • Pricing hasn’t been announced for the API yet, but self-hosting will be free
  • You’ll need an H100 or A100 with 48GB+ VRAM for local inference

If you want to try it before the API goes live, the only current option is the Artificial Analysis arena, where you can see Happy Horse outputs in blind comparisons against other models.

FAQ

Is Happy Horse actually open-source?

Alibaba has confirmed it will be. But as of today, the GitHub repository shows “coming soon” for weights and inference code. The commitment is there; the files aren’t yet. I’d wait for the actual weight release before planning around it.

How does Happy Horse compare to Sora?

Sora 2 Pro currently sits at #4 on the text-to-video leaderboard, well behind Happy Horse at #1. More relevantly, OpenAI announced on March 24 that Sora is shutting down. The app closes April 26 and the API follows on September 24. OpenAI was reportedly losing $120 million per month on Sora compute costs.

Can I run it locally?

Once weights are released, yes, if you have the hardware. The model needs at least 48GB of VRAM, which means an H100, an A100 80GB, or possibly two consumer GPUs with NVLink. It won’t run on a gaming card anytime soon. Expect the community to start working on quantized versions shortly after the weights drop, though. If you’ve set up Ollama or llama.cpp for Gemma 4, the workflow for loading new open models will be familiar.

Who is Zhang Di?

Former VP of Technology at Kuaishou (China’s #2 short video platform), where he built the Kling 1.0 and Kling 2.0 video generation models. He moved to Alibaba’s Taotian Group in late 2025 and leads the Future Life Lab that produced Happy Horse.

Why was it submitted anonymously?

This is a common strategy in Chinese AI circles. Anonymous submission to public leaderboards lets the model’s quality speak without bias from the company’s name. DeepSeek used a similiar approach. It also generates buzz. “Mystery model tops the leaderboard” drives more attention than “Alibaba releases new video model.”

Bottom Line

An e-commerce company’s side project just rewrote the AI video leaderboard. One engineer moved from Kuaishou to Alibaba and topped the models he built at his old job.

When the weights drop and the API opens on April 30, anyone can run this locally. I’m most curious to see what happens to the pricing in AI video generation once a free, open-weight model is sitting at #1. Sora couldn’t make the economics work at $120M/month. Happy Horse might not need to.