Back

Transcription Service

Example Podcast Transcription

Response Formats
speaker_organized
single_paragraph
raw

Key Features

  1. 100% Serverless and Edge-Optimized: Pay only for what you use with CloudFront, CloudFront Key Value Store, AWS Lambda, and S3.

  2. Deepgram Integration: Utilize the powerful Speech-to-Text API with support for 36 languages.

  3. Speaker Diarization: Transcripts clearly identify speakers, ideal for meetings, interviews, and podcasts.

  4. Custom Domain Configuration: Set up a custom domain for a professional and accessible transcription service.

  5. Flexible Deployment Options: Choose local deployment for easy management or automate with GitHub Actions for seamless AWS integration.

  6. API Key Security: Secure access with API key generation and management using CloudFront Key Value Store.

  7. Synchronous and Asynchronous API Endpoints: Instant transcriptions for short audio files and background processing for longer ones.

Architecture

transcription-service drawio

Helper Functions

  • Makefile - Contains all commands mentioned in README
  • Download YouTube video
  • Convert MP4 to MP3

Prerequisites

Before you begin, ensure you have the following:

Deployment Options

  1. Deploy via GitHub Actions
  2. Deploy infrastructure from your local environment

GitHub Actions: Steps to Deploy

Step 1: Create a Deepgram Account and API Key

  1. Sign up for an account at Deepgram.
  2. Create an API key within the Deepgram dashboard.

Step 2: Upload Deepgram API Key to AWS Parameter Store

  1. Replace 'your-deepgram-api-key' with your actual Deepgram API key.
  2. Run the following command to store your API key in AWS Parameter Store:

Step 3: Fork & Clone Repository

  1. Navigate to https://github.com/whatthecloud-io/transcription-service, click "Fork", then create the repository under your account.
  2. Clone the repository

Step 4: Create GitHub OIDC Provider in Your AWS Account

  1. Replace YourGithubUserName with your GitHub username or org.
  2. Run the command below to create an GitHub OIDC Provider and IAM Role named github-oidc-deploy-role which has Administrator permissions (feel free to update to meet your security practices).

github-oidc-deploy-role will be assumed by your GitHub workflow to deploy the infrastructure to your AWS Account.

Step 5: Update .github/workflows/pipeline.yaml

  1. Open .github/workflows/pipeline.yaml, uncomment #- main and update aws-account-id with your AWS Account Id

This will trigger a deployment. Navigate to the Actions tab in GitHub, click on the deployment, then again, to see the list of deployment steps.

Once deployed, you will notice CloudFrontDistributionUrl or CustomDomainName at the bottom of the Deploy SAM application step. You will need one of them during Post Deployment Steps to Use the Transcription Service

Optional - Configure Custom Domain

If you want to use a custom domain instead of the default CloudFront domain, configure the following parameters below in .github/workflows/pipeline.yaml:

  • hosted-zone-id: The ID of your hosted zone in Route 53.
  • hosted-zone-name: The domain name (e.g., example.com).
  • subdomain: The subdomain for your service (e.g., transcribe).

The above example would produce the domain https://transcribe.example.com

Now go to Post Deployment Steps to Use the Transcription Service

Local: Steps to Deploy

Step 1: Create a Deepgram Account and API Key

  1. Sign up for an account at Deepgram.
  2. Create an API key within the Deepgram dashboard.

Step 2: Upload Deepgram API Key to AWS Parameter Store

  1. Replace 'your-deepgram-api-key' with your actual Deepgram API key.
  2. Run the following command to store your API key in AWS Parameter Store:

Step 3: Fork & Clone Repository

  1. Navigate to https://github.com/whatthecloud-io/transcription-service, click "Fork", then create the repository under your account.
  2. Clone the repository

Step 4: Deploy CloudFormation Stack

Run the following command to build and deploy the CloudFormation stack (Docker should be running):

Once deployed, you will notice an Outputs section in your terminal. You will need CloudFrontDistributionUrl or CustomDomainName in a later step.

Optional - Configure Custom Domain

If you want to use a custom domain instead of the default CloudFront domain, configure the following parameters below:

  • HostedZoneId: The ID of your hosted zone in Route 53. Looks like Z2FDTNDATAQYW2
  • HostedZoneName: The domain name (e.g., example.com).
  • SubDomain: The subdomain for your service (e.g., transcribe).

The above example would produce the domain https://transcribe.example.com

This configuration will produce a custom domain like transcribe.example.com.

Now continue with Post Deployment Steps to Use the Transcription Service

Post Deployment Steps to Use the Transcription Service

Step 1: Install Python Dependencies

Ensure you have the necessary Python dependencies installed:

Step 2: Generate Test API Key

Run the following command to generate a test API key and insert it into the CloudFront Key Value Store:

Step 3: Transcribe Audio Files

Synchronous Transcription

  1. Replace your-domain with either CloudFrontDistributionUrl or CustomDomainName from the CloudFormation stack outputs.
  2. Replace your-generated-api-key with the API Key you generated in step 2

To transcribe a 5-minute Apple Intelligence MP3 file synchronously, run:

This will generate a transcript of the audio file and return the result:

Asynchronous Transcription

  1. Replace your-domain and your-generated-api-key again.

To transcribe a 19-minute Total-Microsoft-Recall MP3 file asynchronously, run:

This will return a job ID for the transcription task:

Step 6: Retrieve Asynchronous Transcription Result

  1. Replace your-domain and your-generated-api-key again.
  2. Replace job-id with job_id from the previous request response.

This will return the status and transcript when it is complete.

This will return the transcription result for the specified job ID:

[!NOTE]

  • Requests made to the /transcribe/sync endpoint must complete within 30 seconds; otherwise, they will time out. If you expect the transcription to take longer than 30 seconds, use the /transcribe/async and /transcribe/result endpoints.
  • The response_format parameter supports the following options: speaker_organized, single_paragraph, raw.

Wrap up

By following the steps outlined above, you will deploy and use the transcription service successfully. Make sure to update the necessary parameters and variables in the Makefile as needed. If you encounter any issues, refer to the AWS and Deepgram documentation for further assistance.

License

This template is a commercial product and is licensed under the WhatTheCloud License

transcription-service

Serverless, edge-optimized transcription service leveraging AWS CloudFront, Lambda, and Deepgram. Implements API Key authentication via CloudFront Key Value Store and Functions. Provides synchronous and asynchronous endpoints for MP3 transcription, with results stored in S3. Ideal for processing podcasts, meetings, and interviews.

$149

one-time payment

  • Automated AWS Deployments via GitHub Actions Workflow
  • Full Access to GitHub Repository
  • CloudFormation Scripts
  • Deployment Guides
Last updated6 months ago
Personal LicenseLicense