Transcription Service

Example Podcast Transcription

Response Formats

speaker_organized

single_paragraph

raw

Key Features

100% Serverless and Edge-Optimized: Pay only for what you use with CloudFront, CloudFront Key Value Store, AWS Lambda, and S3.
Deepgram Integration: Utilize the powerful Speech-to-Text API with support for 36 languages.
Speaker Diarization: Transcripts clearly identify speakers, ideal for meetings, interviews, and podcasts.
Custom Domain Configuration: Set up a custom domain for a professional and accessible transcription service.
Flexible Deployment Options: Choose local deployment for easy management or automate with GitHub Actions for seamless AWS integration.
API Key Security: Secure access with API key generation and management using CloudFront Key Value Store.
Synchronous and Asynchronous API Endpoints: Instant transcriptions for short audio files and background processing for longer ones.

Architecture

transcription-service drawio

Helper Functions

Makefile - Contains all commands mentioned in README
Download YouTube video
Convert MP4 to MP3

Prerequisites

Before you begin, ensure you have the following:

An AWS account
AWS CLI installed and configured - Install the AWS CLI
AWS SAM CLI installed - Install the SAM CLI
Python installed
Docker install - Install Docker

Deployment Options

Deploy via GitHub Actions
Deploy infrastructure from your local environment

GitHub Actions: Steps to Deploy

Step 1: Create a Deepgram Account and API Key

Sign up for an account at Deepgram.
Create an API key within the Deepgram dashboard.

Step 2: Upload Deepgram API Key to AWS Parameter Store

Replace 'your-deepgram-api-key' with your actual Deepgram API key.
Run the following command to store your API key in AWS Parameter Store:

Step 3: Fork & Clone Repository

Navigate to https://github.com/whatthecloud-io/transcription-service, click "Fork", then create the repository under your account.
Clone the repository

Step 4: Create GitHub OIDC Provider in Your AWS Account

Replace YourGithubUserName with your GitHub username or org.
Run the command below to create an GitHub OIDC Provider and IAM Role named github-oidc-deploy-role which has Administrator permissions (feel free to update to meet your security practices).

github-oidc-deploy-role will be assumed by your GitHub workflow to deploy the infrastructure to your AWS Account.

Step 5: Update .github/workflows/pipeline.yaml

Open .github/workflows/pipeline.yaml, uncomment #- main and update aws-account-id with your AWS Account Id

This will trigger a deployment. Navigate to the Actions tab in GitHub, click on the deployment, then again, to see the list of deployment steps.

Once deployed, you will notice CloudFrontDistributionUrl or CustomDomainName at the bottom of the Deploy SAM application step. You will need one of them during Post Deployment Steps to Use the Transcription Service

Optional - Configure Custom Domain

If you want to use a custom domain instead of the default CloudFront domain, configure the following parameters below in .github/workflows/pipeline.yaml:

hosted-zone-id: The ID of your hosted zone in Route 53.
hosted-zone-name: The domain name (e.g., example.com).
subdomain: The subdomain for your service (e.g., transcribe).

The above example would produce the domain https://transcribe.example.com

Now go to `Post Deployment Steps to Use the Transcription Service`

Local: Steps to Deploy

Step 1: Create a Deepgram Account and API Key

Sign up for an account at Deepgram.
Create an API key within the Deepgram dashboard.

Step 2: Upload Deepgram API Key to AWS Parameter Store

Replace 'your-deepgram-api-key' with your actual Deepgram API key.
Run the following command to store your API key in AWS Parameter Store:

Step 3: Fork & Clone Repository

Navigate to https://github.com/whatthecloud-io/transcription-service, click "Fork", then create the repository under your account.
Clone the repository

Step 4: Deploy CloudFormation Stack

Run the following command to build and deploy the CloudFormation stack (Docker should be running):

Once deployed, you will notice an Outputs section in your terminal. You will need CloudFrontDistributionUrl or CustomDomainName in a later step.

Optional - Configure Custom Domain

If you want to use a custom domain instead of the default CloudFront domain, configure the following parameters below:

HostedZoneId: The ID of your hosted zone in Route 53. Looks like Z2FDTNDATAQYW2
HostedZoneName: The domain name (e.g., example.com).
SubDomain: The subdomain for your service (e.g., transcribe).

The above example would produce the domain https://transcribe.example.com

This configuration will produce a custom domain like transcribe.example.com.

Now continue with `Post Deployment Steps to Use the Transcription Service`

Post Deployment Steps to Use the Transcription Service

Step 1: Install Python Dependencies

Ensure you have the necessary Python dependencies installed:

Step 2: Generate Test API Key

Run the following command to generate a test API key and insert it into the CloudFront Key Value Store:

Step 3: Transcribe Audio Files

Synchronous Transcription

Replace your-domain with either CloudFrontDistributionUrl or CustomDomainName from the CloudFormation stack outputs.
Replace your-generated-api-key with the API Key you generated in step 2

To transcribe a 5-minute Apple Intelligence MP3 file synchronously, run:

This will generate a transcript of the audio file and return the result:

Asynchronous Transcription

Replace your-domain and your-generated-api-key again.

To transcribe a 19-minute Total-Microsoft-Recall MP3 file asynchronously, run:

This will return a job ID for the transcription task:

Step 6: Retrieve Asynchronous Transcription Result

Replace your-domain and your-generated-api-key again.
Replace job-id with job_id from the previous request response.

This will return the status and transcript when it is complete.

This will return the transcription result for the specified job ID:

[!NOTE]

Requests made to the /transcribe/sync endpoint must complete within 30 seconds; otherwise, they will time out. If you expect the transcription to take longer than 30 seconds, use the /transcribe/async and /transcribe/result endpoints.

The response_format parameter supports the following options: speaker_organized, single_paragraph, raw.

Wrap up

By following the steps outlined above, you will deploy and use the transcription service successfully. Make sure to update the necessary parameters and variables in the Makefile as needed. If you encounter any issues, refer to the AWS and Deepgram documentation for further assistance.

License

This template is a commercial product and is licensed under the WhatTheCloud License

Transcription Service#

Example Podcast Transcription#

Key Features#

Architecture#

Helper Functions#

Prerequisites#

Deployment Options#

GitHub Actions: Steps to Deploy#

Step 1: Create a Deepgram Account and API Key#

Step 2: Upload Deepgram API Key to AWS Parameter Store#

Step 3: Fork & Clone Repository#

Step 4: Create GitHub OIDC Provider in Your AWS Account#

Step 5: Update .github/workflows/pipeline.yaml#

Optional - Configure Custom Domain#

Now go to Post Deployment Steps to Use the Transcription Service#

Local: Steps to Deploy#

Step 1: Create a Deepgram Account and API Key#

Step 2: Upload Deepgram API Key to AWS Parameter Store#

Step 3: Fork & Clone Repository#

Step 4: Deploy CloudFormation Stack#

Optional - Configure Custom Domain#

Now continue with Post Deployment Steps to Use the Transcription Service#

Post Deployment Steps to Use the Transcription Service#

Step 1: Install Python Dependencies#

Step 2: Generate Test API Key#

Step 3: Transcribe Audio Files#

Synchronous Transcription#

Asynchronous Transcription#

Step 6: Retrieve Asynchronous Transcription Result#

Wrap up#

License#

Transcription Service

Example Podcast Transcription

Key Features

Architecture

Helper Functions

Prerequisites

Deployment Options

GitHub Actions: Steps to Deploy

Step 1: Create a Deepgram Account and API Key

Step 2: Upload Deepgram API Key to AWS Parameter Store

Step 3: Fork & Clone Repository

Step 4: Create GitHub OIDC Provider in Your AWS Account

Step 5: Update .github/workflows/pipeline.yaml

Optional - Configure Custom Domain

Now go to `Post Deployment Steps to Use the Transcription Service`

Local: Steps to Deploy

Step 1: Create a Deepgram Account and API Key

Step 2: Upload Deepgram API Key to AWS Parameter Store

Step 3: Fork & Clone Repository

Step 4: Deploy CloudFormation Stack

Optional - Configure Custom Domain

Now continue with `Post Deployment Steps to Use the Transcription Service`

Post Deployment Steps to Use the Transcription Service

Step 1: Install Python Dependencies

Step 2: Generate Test API Key

Step 3: Transcribe Audio Files

Synchronous Transcription

Asynchronous Transcription

Step 6: Retrieve Asynchronous Transcription Result

Wrap up

License