AWS RDS cross account snapshot restoration

Many a times you may have faced problem where your production infra is on different AWS account and non prod on different account and you are required to restore the RDS snapshot to non prod account for testing.

Recently I got a task to restore my prod account RDS snapshot to a different account for testing purpose. It was a very interesting and new task for me. and I was in an awe, how AWS thinks about what all challenges we may face in real life and provides a solution to it.

For those who are not aware about RDS, I can brief RDS as a relational database service by Amazon Web Services (AWS), it is a managed service so we don’t have to worry about the underlying Operating System and Database software installation, we just have to use it.

Amazon RDS creates a storage volume snapshot of your DB instance backing up the entire DB instance and not just individual database. As I told you, we have to copy and restore an RDS snapshot to a different aws account. There is a catch!, you can directly copy an aws snapshot to a different region in same aws account, but to copy to a different aws account you need to share the snapshot to aws account and then restore from there, so lets begin.

To share an automated DB snapshot, create a manual DB snapshot by copying the automated snapshot, and then share that copy.

Step 1: Find the snapshot that you want to copy, and select it by clicking the checkbox next to it’s name. You can select a “Manual” snapshot, or one of the “Automatic” snapshots that are prefixed by “rds:”.

Step 2: From the “Snapshot Actions” menu, select “Copy Snapshot”.

Step 3: On the page that appears: Select the target region. In this case, since we have to share this snapshot with another aws account we can select existing region.

  • Specify your new snapshot name in the “New DB Snapshot Identifier” field. This identifier must not already be used by a snapshot in the target region.
  • Check the “Copy Tags” checkbox if you want the tags on the source snapshot to be copied to the new snapshot.
  • Under “Encryption”, leave “Disable Encryption” selected.
  • Click the “Copy Snapshot” button.

Step 4: Once you click on “Copy Snapshot”, you can see the snapshot being created.

Step 5: Once the manual snapshot is created, select the created snapshot, and from the “Snapshot Actions” menu, select “Share Snapshot”.

Step 6: Define the “DB snapshot visibility” as private and add the “AWS account ID” to which we want to share the snapshot and click on save.

Till this point we have shared our db snapshot to the aws account where we need to restore the db.
Now login to the other aws account and go to RDS console and check for snapshot that was shared just recently.

Step 7: Select the snapshot and from the “Snapshot Actions” menu select “Restore Snapshot”.

Step 8: From here we just need to restore the db as we do normally. Fill out the required details like “DB Instance class”, “Multi-AZ-Deployment”, “Storage Type”, “VPC ID”, “Subnet group”, “Availability Zone”, “Database Port”, “DB parameter group”, as per the need and requirement.

Step 9: Finally click on “Restore DB instance” and voila !!, you are done.

Step 10: You can see the db creation in process. Finally, you have restored the DB to a different AWS account !!

Conclusion:

So there you go. Everything you need to know to restore a production AWS RDS into a different AWS account. That’s cool !! Isn’t it ?, but I haven’t covered everything. There is a lot more to explore. We will walk through RDS best practices in our next blog, till then keep exploring our other tech blogs !!.

Image source: https://unsplash.com/photos/lRoX0shwjUQ


The closer you think you are, the less you’ll actually see

I hope you have seen the movie Now you see me, it has a famous quote The closer you think you are, the less you’ll actually see. Well, this blog is not about this movie but how I got stuck into an issue, because I was not paying attention and looking at the things closely and seeing less hence not able to resolve the issue.

There is a lot happening in today’s DevOps world. And HashiCorp has emerged out to be a big player in this game. Terraform is one of the open source tools to manage infrastructure as code. It plays well with most of the cloud provider. But with all these continuous improvements and enhancements there comes a possibility of issues as well. Below article is about such a scenario. And in case you have found yourself in the same trouble. You are lucky to reach the right page.
I was learning terraform and performing a simple task to launch an Ubuntu EC2 instance in us-east-1 region. For which I required the AMI Id, which I copied from the AWS console as shown in below screenshot.

Once I got the AMI Id, I tried to create the instance using terraform, below is the screenshot of the code

provider “aws” {
  region     = “us-east-1”
  access_key = “XXXXXXXXXXXXXXXXXX”
  secret_key = “XXXXXXXXXXXXXXXXXXX”
}
resource “aws_instance” “sandy” {
        ami = “ami-036ede09922dadc9b
        instance_type = “t2.micro”
        subnet_id = “subnet-0bf4261d26b8dc3fc”
}
I was expecting to see the magic of Terraform but what I got below ugly error.

Terraform was not allowing to spin up the instance. I tried couple of things which didn’t work. As you can see the error message didn’t give too much information. Finally, I thought of giving it a try by  doing same task via AWS web console. I searched for the same ubuntu AMI and selected the image as shown below. Rest of the things, I kept to default. And well, this time it got launched.

And it confused me more. Through console, it was working fine but while using Terraform it says not allowed. After a lot of hair pulling finally, I found the culprit which is a perfect example of how overlooking small things can lead to blunder.

Culprit

While copying the AMI ID from AWS console, I had copied the 64-bit (ARM) AMI ID. Please look carefully, the below screenshot

But while creating it through console I was selecting the default configuration which by is 64-bit(x86). Look at the below screenshot.

To explain it further, I tried to launch the VM with 64-bit (ARM) manually. And while selecting the AMI, I selected the 64-bit (ARM).

And here is the culprit. 64-bit(ARM) only supports a1 instance type

Conclusion

While launching the instance with the terraform, I tried using 64-bit (ARM) AMI ID mistakenly, primarily because for same AMI there are 2 AMI IDs and it is not very visible to eyes unless you pay special attention.

So folks, next time choosing an AMI ID keep it in mind what type of AMI you are selecting. It will save you a lot of time.

Its not you Everytime, sometimes issue might be at AWS End

Today an issue reported to me that website of our client was loading very slow which was hosted on AWS Windows server and the same website was loading fine when accessed from outside AWS network,I just felt like might be a regular issue but it all together took me to an inside out of the network troubleshooting.

Initially, we checked for SSL certificate expiry, which was not the case, so below are the Two steps which we used to troubleshoot the issue:

Troubleshooting through Browser via Web developer Network tool

In browser we checked which part of code was taking time to load using Network option in developer tools:

  • Select web developer tools in firefox
  • Then select network

We identified one of the GET calls was taking time to load.
Then when this thing was reported to AWS support team they provided further analysis of this. We can save the report as (.HAR) file which tells us below things:

  • How long it takes to fetch DNS information
  • How long each object takes to be requested
  • How long it takes to connect to the server
  • How long it takes to transfer assets from the server to the browser of each object.

Troubleshooting using Traceroute

Then we tried to troubleshoot the AWS network flow using “tracert ” with below output:

Tracing route to example.gov [151.x.x.x] over a maximum of 15 hops:

1 <1 ms <1 ms <1 ms 10.x.x.x

2 * * * Request timed out.

3 * * * Request timed out.

4 * * * Request timed out.

5 * * * Request timed out.

6 * * * Request timed out.

7 <1 ms <1 ms <1 ms 100.x.x.x

8 <1 ms <1 ms 1 ms 52.x.x.x

9 * * * Request timed out.

10 2 ms 1 ms 1 ms example.net [67.x.x.x]

11 2 ms 2 ms 2 ms example.net [67.x.x.x]

12 2 ms 2 ms 2 ms example.net [205.x.x.x]

13 3 ms 3 ms 2 ms 63.x.x.x

14 3 ms 3 ms 3 ms 198.x.x.x

15 4 ms 4 ms 4 ms example.net [63.x.x.x]

And when this was reported to AWS team that RTO from 2-6 we were getting was due to connectivity with internal AWS network which needs to be byepass and was not an issue as packet still reached the next server within 1ms.

Traceroute gives an insight to your network problem.

  • The entire path that a packet travels through
  • Names and identity of routers and devices in your path
  • Network Latency or more specifically the time taken to send and receive data to each devices on the path.

Solution provided by AWS Team

After all the Razzle-Dazzle they just refreshed the network from their end and there was no more website latency after that while accessing from AWS internal network.

Tool recommended by AWS Support team for Network troubleshooting if the issue arises in future:

Wireshark along with .har file using network in web-developer tools from browser.

Wireshark is a network packet analyzer. A network packet analyzer will try to capture network packets and tries to display that packet data as detailed as possible.
You could think of a network packet analyzer as a measuring device used to examine what’s going on inside a network cable, just like a voltmeter is used by an electrician to examine what’s going on inside an electric cable (but at a higher level, of course).
In the past, such tools were either very expensive, proprietary, or both. However, with the advent of Wireshark, all that has changed.

Wireshark is perhaps one of the best open source packet analyzers available today.

Features

The following are some of the many features Wireshark provides:

  • Available for UNIX and Windows.
  • Capture live packet data from a network interface.
  • Open files containing packet data captured with tcpdump/WinDump, Wireshark, and a number of other packet capture programs.
  • Import packets from text files containing hex dumps of packet data.
  • Display packets with very detailed protocol information.
  • Save packet data captured.
  • Export some or all packets in a number of capture file formats.
  • Filter packets on many criteria.
  • Search for packets on many criteria.
  • Colorize packet display based on filters.
  • Create various statistics.

… and a lot more!

Marrying Nginx with ELB

Few weeks back I got a requirement to setup a highly available API server. I said not a big deal! I’ll have Nginx as a reverse proxy(Why not directly exposing API via ELB a different story) and my API auto scaled setup will sit behind an internal ELB and things would be in place TA DA.

Things worked perfectly fine for few days, but one day the API consumer reported that they are not getting response back, what? When I checked the API url was indeed returning a 502 error code. It was really strange for nginx to be sending 502 response back that meant the highly scalable setup was down? Well I was proven wrong ELB was working perfectly fine as the curl request to internal ELB was returning proper response, so yes the highly available API setup was in place. What next, yes! Nginx error logs. I did saw Nginx reporting connection timeout with 502 error code. The interesting thing to note that it was an IP(random IP assigned to ELB), when I tried to do curl hit on that IP for API request, it did failed EUREKA EUREKA!! I reproduced the problem.

Well now I’ve to collect all this information and infer what is the logical cause of the problem, and yes there are lot of smart people available who would have fond the solution to this problem so I’ve to ask right question in Google :). The question was “Nginx using Ip instead of domain name” and the answer was “Nginx caches the IP at the startup and obviously as ELB is Elastic in nature so it’s IP changes over the period of time”. That was the reason Nginx was trying to talk to the older un-associated IP’s of Internal ELB.

Finding solution was not a big task as it was just about making sure  Nginx should talk to ELB not the IP’s associated with it, that’s why said marrying nginx with ELB :).

I’ll not go into the actual solution as there are already solutions available in web. I referred this really good blog as a solution.

http://ghost.thekindof.me/nginx-aws-elb-dns-resolution-nginx-resolver-directive-and-black-magic/

VPC per envrionvment versus Single VPC for all environments

This blog talks about the two possible ways of hosting your infrastructure in Cloud, though it will be more close to hosting on AWS as it is a real life example but this problem can be applied to any cloud infrastructure set-up. I’m just sharing my thoughts and pros & cons of both approaches but I would love to hear from the people reading this blog about their take as well what do they think.

Before jumping right away into the real talk I would like to give a bit of background on how I come up with this blog, I was working with a client in managing his cloud infrastructure where we had 4 environments dev, QA, Pre Production and Production and each environment had close to 20 instances, apart from applications instances there were some admin instances as well such as Icinga for monitoring, logstash for consolidating logs, Graphite Server to view the logs, VPN server to manage access of people.

At this point we got into a discussion that whether the current infrastructure set-up is the right one where we are having a separate VPC per environment or the ideal setup would have been a single VPC and the environments could have been separated by subnet’s i.e a pair of subnet(public private) for each environment

Both approaches had some pros & cons associated with them

Single VPC set-up

Pros:

  1. You only have a single VPC to manage
  2. You can consolidate your admin app’s such as Icinga, VPN server.

Cons:

  1. As you are separating your environments through subnets you need granular access control at your subnet level i.e instances in staging environment should not be allowed to talk to dev environment instances. Similarly you have to control access of people at granular level as well
  2. Scope of human error is high as all the instances will be on same VPC.

VPC per environment setup

Pros:

  1. You have a clear separation between your environments due to separate VPC’s.
  2. You will have finer access control on your environment as the access rules for VPC will effectively be access rules for your environments.
  3. As an admin it gives you a clear picture of your environments and you have an option to clone you complete environment very easily.

Cons:

  1. As mentioned in pros of Single VPC setup you are at some financial loss as you would be duplicating admin application’s across environments

In my opinion the decision of choosing a specific set-up largely depends on the scale of your environment if you have a small or even medium sized environment then you can have your infrastructure set-up as “All environments in single VPC”, in case of large set-up I strongly believe that VPC per environment set-up is the way to go.

Let me know your thoughts and also the points in favour or against of both of these approaches.

How to Manage Amazon Web Services Instances part 1

If you want to minimize the amount of money you spend on Amazon Web Services (AWS) infrastructure, then this blog post is for you. In this post I will be discussing  the rationale behind starting & stopping AWS instances in an automated fashion and more importantly, doing it in a correct way. Obviously you could do it through the web console of AWS as well, but it will need your daily involvement. In addition, you would have to take care of starting/stopping various services running on those instances.

Before directly jumping on how we achieved instance management in an automated fashion, I like to state the problem that we were facing. Our application testing infrastructure is on AWS and it is a multiple components(20+) application distributed among 8-9 Amazon instances. Usually our testing team starts working from 10 am, and continues till 7 pm. Earlier we used to keep our testing infrastructure up for 24 hours, even though we were using it for only 9 hours on weekdays, and not using it at all on weekends. Thus, we were wasting more then 50% of the money that we spent on the AWS infrastructure. The obvious solution to this problem was: we needed an intelligent system that would make sure that our amazon infrastructure was up only during the time when we needed it.

The detailed list of the requirements, and the corresponding things that we did were:

  1. We should shut down our infrastructure instances when we are not using them.
  2. There should be a functionality to bring up the infrastructure manually: We created a group of Jenkins jobs, which were scheduled to run at a specific time to start our infrastructure. Also a set of people have execution access to these jobs to start the infrastructure manually, if the need arises.
  3. We should bring up our infrastructure instances when we need it.
  4. There should be a functionality to shut down the infrastructure manually: We created a group of Jenkins jobs that were scheduled to run at a specific time to shut down our infrastructure. Also a set of people have execution access on these jobs to shut down the infrastructure manually, if the need arises.
  5. Automated application/services start on instance start: We made sure that all the applications and services were up and running when the instance was started.
  6. Automated graceful application/services shut down before instance shut down: We made sure that all the applications and services were gracefully stopped before the instance was shut down, so that there was be no loss of data.
  7. We also had to make sure that all the applications and services should be started as per defined agreed order.

Once we had the requirements ready, implementing them was simple, as Amazon provides a number of APIs to achieve this. We used AWS CLI, and needed to use just 2 simple commands that AWS CLI provides.
The command to start an instance :
aws ec2 start-instances –instance-ids i-XXXXXXXX
The command to stop an instance :
aws ec2 stop-instances –instance-ids i-XXXXXXXX 

Through above commands you can automate starting and stopping AWS instances, but you might not be doing it the correct way. As you didn’t restrict the AWS CLI allow firing of start-instances and stop-instances commands only, you could use other commands and that could turn out to be a problem area. Another important point to consider is that we should restrict the AWS instances on which above commands could be executed, as these commands could be mistakenly run with the instance id of a production amazon instance id as an argument, creating undesirable circumstances 🙂

In the next blog post I will talk about how to start and stop AWS instances in a correct way.

Attach a new volume to EC2 Instance

This blog will talk about how to mount a new volume to an existing EC2 instance, though it is very straightforward & simple, but it’s good to have a checklist ready with you so that you can do things in one go instead of searching here and there. The most important thing to take note of in this blog is that you have to do couple of manual operations apart from mounting the volume through AWS Web UI.

  1. Go to the AWS Volumes screen, create a new volume if not created already.
  2. Select Attach Volume in Actions button
  3. Choose the instance, to which this volume needs to be mounted
  4. Confirm the volume state changes from available to in-use
  5. Go to the AWS Instances screen, select the EC2 instance to which volume was attached
  6. Check the Block Devices in the details section you can see the new volume details their. Let’s say it is mounted at /dev/sdf.
  7. Now log in to the EC2 instance machine, you can’t see the mounted volume yet(it is like an external un-formatted hdd that is connected to a linux box)
  8. To make it usable execute below commands
sudo su –                              [Switch to superuser]
mkfs -t ext3 /dev/xvdf                      [Format the drive if it is a new volume]
mkdir /home/mettl/mongo                      [Simply create a new directory]
mount /dev/xvdf /home/mettl/mongo   [Mount the drive on newly created directory]
Make sure to change permissions according to how you use it.
  1. To mount EBS volumes automatically on startup add an entry in /etc/fstab
/dev/xvdf    /home/mettl/mongo    ext3    defaults,nobootwait,comment=cloudconfig    0    0
Hope you will find this blog useful, rest assured this is the starting point of a new series I would be talking about couple of other best practices such as why do you need to have this kind of setup, how you will upgrade volume in case of a running ec2-instance..