Skip to main content
← Back to Blog

From Code to Content: Automating Blog Posts with AWS Lambda

Advanced automation aws-lambda

From Code to Content: Automating Blog Posts with AWS Lambda

Writing technical blog posts with properly syntax-highlighted code screenshots takes hours of manual work — formatting snippets, capturing images, uploading assets, stitching it all together. Multiply that across a content pipeline and you’ve got a bottleneck that no engineering team wants to own.

This tutorial walks through building an AWS Lambda function that automates the entire pipeline: you feed it a code snippet, and it generates a blog post draft via OpenAI, renders syntax-highlighted screenshots with Puppeteer, and uploads everything to S3. By the end, you’ll have a deployable serverless function that turns raw code into publish-ready content in under 30 seconds.

Setting Up the Lambda Handler and IAM Permissions

Before writing any application logic, you need the Lambda function itself and the IAM permissions that let it talk to S3 and invoke external APIs. Skip this step and you’ll burn an hour debugging AccessDenied errors.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3BlogAssetAccess",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::your-blog-assets-bucket/*"
    },
    {
      "Sid": "CloudWatchLogs",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    }
  ]
}

Attach this policy to your Lambda execution role. Resist the urge to use s3:* — least privilege matters, especially when the function accepts external input.

Now for the SAM template that defines the function with enough memory for headless Chromium and a timeout that won’t choke on OpenAI latency:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Blog post generator - code to content pipeline

Globals:
  Function:
    Timeout: 120          # OpenAI + Puppeteer rendering needs headroom
    MemorySize: 1600      # Headless Chromium is memory-hungry

Resources:
  BlogGeneratorFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs18.x
      Architectures:
        - x86_64
      Environment:
        Variables:
          OPENAI_API_KEY: !Sub '{{resolve:ssm:/blog-generator/openai-api-key}}'
          S3_BUCKET_NAME: !Ref BlogAssetsBucket
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref BlogAssetsBucket

  BlogAssetsBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: your-blog-assets-bucket
      PublicAccessBlockConfiguration:
        BlockPublicAcls: false       # Screenshots need public read access
      CorsConfiguration:
        CorsRules:
          - AllowedOrigins: ['*']
            AllowedMethods: [GET]
            AllowedHeaders: ['*']

Outputs:
  FunctionArn:
    Value: !GetAtt BlogGeneratorFunction.Arn

Two things to note: the API key is pulled from AWS Systems Manager Parameter Store rather than hardcoded in the template, and the memory is set to 1,600 MB. Drop below ~1,500 MB and Puppeteer will throw out-of-memory errors during screenshot rendering — this is the single most common deployment failure.

// index.js — Lambda entry point
const { randomUUID } = require('crypto');
const { generateContent } = require('./lib/openai');
const { captureScreenshots } = require('./lib/screenshots');
const { uploadToS3 } = require('./lib/s3');

exports.handler = async (event) => {
  // Validate required fields early to avoid burning API credits on bad input
  const { code, filename, language } = JSON.parse(event.body);

  if (!code || !language) {
    return {
      statusCode: 400,
      body: JSON.stringify({ error: 'Missing required fields: code, language' }),
    };
  }

  const taskId = randomUUID();
  console.log(`[${taskId}] Starting blog generation for ${filename || 'unnamed'} (${language})`);

  try {
    // Step 1: Generate the blog post markdown via OpenAI
    const blogMarkdown = await generateContent({ code, filename, language, taskId });

    // Step 2: Find screenshot markers and render highlighted code blocks
    const screenshots = await captureScreenshots({ code, language, taskId });

    // Step 3: Upload all assets and assemble the final post
    const postUrl = await uploadToS3({ blogMarkdown, screenshots, taskId });

    return {
      statusCode: 200,
      body: JSON.stringify({
        taskId,
        postUrl,
        screenshotCount: screenshots.length,
      }),
    };
  } catch (err) {
    console.error(`[${taskId}] Pipeline failed:`, err);
    return {
      statusCode: 500,
      body: JSON.stringify({ error: 'Generation failed', taskId }),
    };
  }
};

The handler follows a strict three-phase pipeline: generate → capture → upload. Each phase is isolated in its own module so you can test and debug them independently. The taskId threads through every function call and log line, which makes tracing a failed run in CloudWatch straightforward — you filter by a single UUID instead of digging through interleaved logs.

Notice the input validation at the top. Without it, a malformed request still hits OpenAI’s API, and you’re paying for a wasted completion. Fail fast, fail cheap.

Rendering Code Screenshots with Puppeteer on Lambda

Puppeteer on Lambda is where most people hit a wall. The standard puppeteer package bundles a full Chromium binary that blows past Lambda’s 250 MB deployment limit. The fix is @sparticuz/chromium, a stripped-down Chromium build designed specifically for serverless environments.

Install both packages:

npm install puppeteer-core @sparticuz/chromium
// lib/screenshots.js — Renders syntax-highlighted code to PNG
const chromium = require('@sparticuz/chromium');
const puppeteer = require('puppeteer-core');

// Prism CSS and JS served from CDN to keep the Lambda package small
const PRISM_CSS = 'https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism-tomorrow.min.css';
const PRISM_JS = 'https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js';

function buildHighlightedHtml(code, language) {
  return `<!DOCTYPE html>
<html>
<head>
  <link rel="stylesheet" href="${PRISM_CSS}" />
  <style>
    body {
      margin: 0;
      padding: 24px;
      background: #1e1e1e;
      display: inline-block;  /* Shrink-wrap to content size */
    }
    pre {
      margin: 0;
      border-radius: 8px;
      font-size: 14px;
      line-height: 1.5;
    }
  </style>
</head>
<body>
  <pre><code class="language-${language}">${escapeHtml(code)}</code></pre>
  <script src="${PRISM_JS}"></script>
</body>
</html>`;
}

function escapeHtml(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;');
}

async function captureScreenshots({ code, language, taskId }) {
  let browser = null;

  try {
    browser = await puppeteer.launch({
      args: chromium.args,
      defaultViewport: { width: 800, height: 600 },
      executablePath: await chromium.executablePath(),
      headless: chromium.headless,
    });

    const page = await browser.newPage();
    const html = buildHighlightedHtml(code, language);

    await page.setContent(html, { waitUntil: 'networkidle0' });

    // Resize viewport to match actual content dimensions
    const bodyHandle = await page.$('body');
    const boundingBox = await bodyHandle.boundingBox();
    await page.setViewport({
      width: Math.ceil(boundingBox.width),
      height: Math.ceil(boundingBox.height),
    });

    const screenshotBuffer = await page.screenshot({
      type: 'png',
      fullPage: true,
    });

    console.log(`[${taskId}] Screenshot captured: ${screenshotBuffer.length} bytes`);

    return [
      {
        buffer: screenshotBuffer,
        key: `${taskId}/code-screenshot.png`,
        contentType: 'image/png',
      },
    ];
  } finally {
    // Always close the browser — leaked Chromium processes will exhaust Lambda memory
    if (browser) await browser.close();
  }
}

module.exports = { captureScreenshots };

A few things worth calling out. The finally block closing the browser isn’t optional. Lambda reuses execution environments, and a leaked Chromium process from a previous invocation will eat your memory allocation before the next run even starts. You’ll see phantom SIGKILL errors in CloudWatch with no obvious cause — this is almost always the reason.

The viewport resizing trick on lines 52–57 is what produces clean screenshots without excess whitespace. Without it, you get a fixed 800×600 canvas with dead space below short snippets or clipped output on long ones.

One gotcha with @sparticuz/chromium: it ships as a Lambda Layer or a compressed binary that extracts at runtime into /tmp. Lambda gives you 512 MB in /tmp by default. If you’re running multiple screenshot operations in a single invocation, monitor that disk usage. You can increase it to 10 GB in your SAM template by adding an EphemeralStorage property to the function resource:

  BlogGeneratorFunction:
    Type: AWS::Serverless::Function
    Properties:
      EphemeralStorage:
        Size: 1024   # MB — bump this if /tmp fills up during multi-screenshot runs

Uploading Assets to S3

The upload module is straightforward, but pay attention to the ContentType and ACL settings. Get these wrong and your screenshots either won’t render in browsers or won’t be publicly accessible:

// lib/s3.js — Upload generated assets to S3
const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');

const s3 = new S3Client({});
const BUCKET = process.env.S3_BUCKET_NAME;

async function uploadToS3({ blogMarkdown, screenshots, taskId }) {
  // Upload each screenshot with public-read access
  for (const shot of screenshots) {
    await s3.send(new PutObjectCommand({
      Bucket: BUCKET,
      Key: shot.key,
      Body: shot.buffer,
      ContentType: shot.contentType,
      ACL: 'public-read',
    }));
    console.log(`[${taskId}] Uploaded: ${shot.key}`);
  }

  // Upload the markdown post itself
  const markdownKey = `${taskId}/post.md`;
  await s3.send(new PutObjectCommand({
    Bucket: BUCKET,
    Key: markdownKey,
    Body: blogMarkdown,
    ContentType: 'text/markdown',
  }));

  const region = await s3.config.region();
  return `https://${BUCKET}.s3.${region}.amazonaws.com/${markdownKey}`;
}

module.exports = { uploadToS3 };

Deploy the whole stack with a single command:

sam build && sam deploy --guided

The --guided flag walks you through stack naming, region selection, and IAM capability acknowledgment on the first deploy. Subsequent deploys remember your choices in samconfig.toml.

Key Takeaways

  • Use @sparticuz/chromium instead of full Puppeteer — the standard Chromium binary exceeds Lambda’s deployment size limit. The stripped-down package is purpose-built for serverless and extracts into /tmp at runtime.
  • Set Lambda memory to at least 1,600 MB — headless Chromium is the bottleneck. Below ~1,500 MB you’ll hit out-of-memory crashes that surface as cryptic Runtime.ExitError messages with no useful stack trace.
  • Always close the browser in a finally block — Lambda reuses execution environments, so leaked Chromium processes persist across invocations and silently consume your memory allocation.
  • Pull secrets from SSM Parameter Store, not environment variables — hardcoded API keys in SAM templates end up in CloudFormation state and version control. SSM dynamic references resolve at deploy time and stay out of your codebase.
  • Thread a taskId through every function and log line — when a run fails at 2 AM, a single CloudWatch Insights query filtered by UUID gets you from symptom to root cause in seconds instead of minutes.