workflowdynamicshero

HeyGen Tutorial

How to Create an AI Avatar Video with HeyGen

HeyGen lets you generate a talking avatar video from a script, a photo, or a short webcam recording — without filming, editing, or production equipment. This tutorial covers the core workflow: setting up an avatar (stock, photo-based, or a custom digital twin), writing or pasting a script, choosing a voice, and exporting your finished video.

What this tutorial covers: choosing between stock avatars, photo-based avatars, and a custom digital twin; building out a script with voice and scene settings; and exporting your video.

Prerequisites:

  • A HeyGen account (the free plan allows a limited number of watermarked videos; higher tiers unlock custom avatars and longer renders)
  • A script, or a rough outline you’re comfortable refining inside HeyGen’s editor
  • For a custom digital twin: a webcam or smartphone and about 15–20 seconds of recorded footage

For platform comparisons and pricing, see our Synthesia vs. HeyGen comparison or our full HeyGen review.

Step 1: Start a New Video

From the HeyGen dashboard, choose to start from scratch or select one of HeyGen’s AI video templates, organized by use case (marketing, training, social clips, and more). Starting from a template sets up scene structure and pacing you can then edit — starting from scratch gives you a blank canvas in AI Studio.

Step 2: Choose Your Avatar Type

HeyGen offers three main avatar paths:

  • Stock avatars: a library of pre-built presenters you can use immediately, no setup required
  • Photo-based avatar: upload a single front-facing, well-lit photo and HeyGen animates it into a talking avatar — works with real photos and even stylized images, as long as the facial structure is clearly visible
  • Digital twin (custom avatar): record about 15 seconds of webcam footage to create a photorealistic avatar of yourself that can deliver any script with natural lip-sync

For a photo-based avatar, avoid blurry or low-resolution images — image quality directly affects how accurately the AI tracks expressions and movement.

Step 3: Write or Paste Your Script

In AI Studio, drop your script into the text editor for your chosen scene. Even rough bullet points work as a starting point — HeyGen’s editor allows you to refine wording directly inline. Each scene has its own script block, so longer videos are typically broken into multiple scenes rather than one continuous block of text.

Step 4: Choose a Voice

Select a voice from HeyGen’s library, available in 40+ languages, and preview it against your script before committing. If you’ve created a digital twin with voice cloning, your cloned voice will appear as an option — alternatively, you can upload your own audio recording and have the avatar lip-sync to it directly, or type a script and use AI text-to-speech.

Step 5: Customize the Scene

Adjust the avatar’s size and position within the frame, set or upload a background, and add any on-screen media (images, logos, lower-thirds, or branding elements). HeyGen’s avatars are designed to deliver scripts with natural tone and gesture by default — for emphasis on specific lines, tools like Voice Director allow finer control over delivery and pacing.

Step 6: Preview and Render

Use the preview function to review the full scene before committing to a render. Once satisfied, click Render (or Generate). Rendering time depends on video length and avatar type — recent updates to HeyGen’s rendering pipeline have focused on reducing wait times, but longer videos with custom digital twins will still take longer than short clips using stock avatars.

Step 7: Export and Translate (Optional)

Download the finished video once rendering completes. If you need versions in other languages, HeyGen’s translation feature can dub the video into additional languages while preserving lip-sync — useful for repurposing one video across multiple regional audiences without re-recording.

Settings That Are Easy to Miss

  • Free plan videos are watermarked and limited: the free tier is useful for testing the workflow, but the credit allowance (a small number of videos per month) and watermark make it unsuitable for client-facing or published content — factor this into your plan choice before committing to a production schedule.
  • Photo quality determines avatar realism more than any setting: a high-resolution, evenly lit, front-facing photo with visible facial detail will outperform any post-processing adjustment — if results look off, the fix is usually a better source photo, not a setting change.
  • Digital twin footage requirements are stricter than they look: roughly 15 seconds sounds trivial, but inconsistent lighting, camera movement, or a non-neutral background during that recording can affect the final avatar’s quality — treat the short recording with the same care as a longer one.
  • Credits are consumed by render minutes, not by number of edits: you can revise a script and re-preview as many times as you like, but each full render consumes credits based on output length — it’s worth finalizing the script before rendering rather than rendering repeatedly during the editing process.
  • Voice cloning and digital twins are tied to consent: as with other AI avatar platforms, creating a likeness or voice clone of anyone other than yourself requires their direct participation in the recording process — there’s no upload-only workaround for this.

Related Reading