Terraform for Serverless Series: Enhanced Management of Amazon S3 Websites
September 28th, 2018 / 5 min read
Last week I started working on a new series of stories focused on terraform for serverless. In the first article I took Severless Applications with AWS Lambda and API Gateway user guide and improved it with Enhanced Management of AWS Lambda Functions. Today I would like to double down on this approach and describe in details how our team provisions and deploys Amazon S3 with website hosting feature for static web pages or single page applications.
Terraform for Amazon S3
Similar to previous blog post, it should go without saying (or at least stated up front): My personal preference is to separate terraform configurations into group / type / service / function specific .tf files. So, normally, I would define my providers in provider.tf, setup my data in data.tf, specify my variables in variables.tf, configure my resources in main.tf and so on. But, for the purpose of this article, I'm putting everything all together into one single .tf file, ignoring best practices as long as it works:
Simply put, above code will create Amazon S3 bucket with 2 features enabled: (1) website hosting and (2) cross-origin resource sharing. Access control list permissions are set to read-only mode to anyone.
To be noted that index.html and 404.html will not be uploaded somehow magically by default to newly created bucket. Furthermore, in order to update our website with new content, we would need to upload updated files / assets to S3 bucket including (but not limited to) html, scripts, stylesheets, images, etc. Therefore, first upload and all subsequent updates must be implemented separately because these operations are not managed natively by terraform.
Terraform for Website Build
In AWS Lambda function we have implemented terraform's external and null_data_source data configurations. The goal of external data is to trigger custom script that executes build process, while null data is referenced in terraform resource configuration as a trigger on terraform plan or terraform apply actions. In order to link them together natively, null_data_source data configuration depends on external data.
Now, the main trick of this approach is who is triggering the code execution. In case of terraform configuration for AWS Lambda this job is managed by source_hash_code. This attribute connects aws_lambda_function resource to build.js script through external data. We use base64sha256 function on example.zip file as the trigger, so every time when .zip is changed during terraform apply, corresponding AWS Lambda function is updated.
Let's apply the same approach to terraform configuration for Amazon S3:
Terraform's S3 bucket resource configuration doesn't offer any native triggers, so we decided to engage null_resource with triggers and provisioner instead. In this case provisioner is the one executing build.sh script, while triggers are associated with timestamp value which in theory is different at every run. This configuration assures build process execution at every terraform plan and terraform apply action.
Let's take a look at build.sh script:
In this article we intentionally used bash instead of nodejs to show that build process can be language agnostic. First part of the script validates input variables. Second part of the script is application specific build process which in our case is managed by npm. Third part of the script is using awscli in order to upload newly created build to Amazon S3 bucket. Last part of the script is a simple echo to show that everything was successful. Note that all messages are wrapped into JSON format in order to make them compatible with terraform actions like plan or apply. Especially in case of a build failure, developers will be able to see the corresponding failure message in its own terraform output.
Still Needs Improvements
Unfortunately, above code is not perfect. We are aware of the following issues:
- build.sh will be executed at every terraform plan action (as well as apply or destroy if plan is not passed as input variable); we are working to optimize it and improve it by checking if any file in build_path was recently changed in comparison with timestamp of corresponding file (or files) in s3_path
- this implementation triggers builds on timestamp; we are working to optimize and improve it by comparing the timestamp of previous build with triggers value generated by terraform
Spoiler Alert: all steps and work-arounds described in this article are carefully crafted into language agnostic functionality that will be released soon as a new feature in our open source project TerraHub CLI.
We would love to hear thoughts and comments on what could be done better.