One more reason to use Docker

Recently I was working on a project which includes Terraform and AWS stuff. While working on that I was using my local machine for terraform code testing and luckily everything was going fine. But when we actually want to test it for the production environment we got some issues there. Then, as usual, we started to dig into the issue and finally, we got the issue which was quite a silly one 😜. The production server Terraform version and my local development server Terraform version was not the same. 

After wasting quite a time on this issue, I decided to come up with a solution so this will never happen again.

But before jumping to the solution, let’s think is this problem was only related to Terraform or do we have faced the similar kind of issue in other scenarios as well.

Well, I guess we face a similar kind of issue in other scenarios as well. Let’s talk about some of the scenario’s first.

Suppose you have to create a CI pipeline for a project and that too with code re-usability. Now pipeline is ready and it is working fine in your project and then after some time, you have to implement the same kind of pipeline for the different project. Now you can use the same code but you don’t know the exact version of tools which you were using with CI pipeline. This will lead you to error elevation. 

Let’s take another example, suppose you are developing something in any of the programming languages. Surely that utility or program will have some dependencies as well. While installing those dependencies on the local system, it can corrupt your complete system or package manager for dependency management. A decent example is Pip which is a dependency manager of Python😉.

These are some example scenarios which we have faced actually and based on that we got the motivation for writing this blog.

The Solution

To resolve all this problem we just need one thing i.e. containers. I can also say docker as well but container and docker are two different things.

But yes for container management we use docker.

So let’s go back to our first problem the terraform one. If we have to solve that problem there are multiple ways to solve this. But we tried it to solve this using Docker.

As Docker says

Build Once and Run Anywhere

So based on this statement what we did, we created a Dockerfile for required Terraform version and stored it alongside with the code. Basically our Dockerfile looks like this:-

FROM alpine:3.8

MAINTAINER OpsTree.com

ENV TERRAFORM_VERSION=0.11.10

ARG BASE_URL=https://releases.hashicorp.com/terraform

RUN apk add --no-cache curl unzip bash \
    && curl -fsSL -o /tmp/terraform.zip ${BASE_URL}/${TERRAFORM_VERSION}/terraform_${TERRAFORM_VERSION}_linux_amd64.zip \
    && unzip /tmp/terraform.zip -d /usr/bin/

WORKDIR /opstree/terraform

USER opstree

In this Dockerfile, we are defining the version of Terraform which needs to run the code.
In a similar fashion, all other above listed problem can be solved using Docker. We just have to create a Dockerfile with exact dependencies which are needed and that same file can work in various environments and projects.

To take it to the next level you can also dump a Makefile as well to make everyone life easier. For example:-

IMAGE_TAG=latest
build-image:
    docker build -t opstree/terraform:${IMAGE_TAG} -f Dockerfile .

run-container:
    docker run -itd --name terraform -v ~/.ssh:/root/.ssh/ -v ~/.aws:/root/.aws -v ${PWD}:/opstree/terraform opstree/terraform:${IMAGE_TAG}

plan-infra:
    docker exec -t terraform bash -c "terraform plan"

create-infra:
    docker exec -t terraform bash -c "terraform apply -auto-approve"

destroy-infra:
    docker exec -t terraform bash -c "terraform destroy -auto-approve"

And trust me after making this utility available the reactions of the people who will be using this utility will be something like this:-

Now I am assuming you guys will also try to simulate the Docker in multiple scenarios as much as possible.

There are a few more scenarios which yet to be explored to enhance the use of Docker if you find that before I do, please let me know.

Thanks for reading, I’d really appreciate any and all feedback, please leave your comment below if you guys have any feedback.

Cheers till the next time.

Tuning Of ElasticSearch Cluster

Related image

Store, Search And Analyse!

Scenario

The first thing which comes in mind when I hear about logging solutions in my infrastructure is ELK (Elasticsearch, Logstash, Kibana).
But, what happens when logs face an upsurge in the quantity and hamper performance, which, in Elasticsearch words, we may also call “A Fall Back”
We need to get control of situation, and optimize our setup. For which, we require a need for tuning the Elasticsearch

What Is ElasticSearch?

It is a java based, open-source project build over Apache Lucene and released under the license of Apache. It has the ability to store, search and analyse document files in diverse format.

A Bit Of History

Image result for history drawing

Shay Banon was the founder of compass project, thought of need to create a scalable search engine which could support other languages than java.
Therefore, he started to build a whole new project which was the 3rd version of compass using JSON and HTTP interface. The first version of which was released in 2010.

ElasticSearch Cluster

Elasticsearch is a java based project which runs on Java Virtual Machines, wherein each JVM server is considered to be an elasticsearch node. In order to support scalability, elasticsearch holds up the concept of cluster in which multiple nodes runs on one or more host machines which can be grouped together into a cluster which has a unique name.
These clustered nodes holds up the entire data in the form of documents and provides the functionality of indexing and search of those documents.

Types Of Nodes:-

  • Master Eligible-Node
    Masters are meant for cluster/admin operations like allocation, state maintenance, index/alias creation, etc
  • Data Node
    Data nodes hold data and perform data-related operations such as CRUD, search, and aggregations.
  • Ingest Node
    Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing.
  • Tribe Node
    It is a special type of coordinating node that can connect to multiple clusters and perform search and other operations across all connected clusters.
Image result for nodes in elasticsearch cluster

Shards and Replicas

  • Shards: Further dividing index into multiple entities are called shards
  • Replicas: Making one or more copies of the index’s shards called as replica shards or simple replicas

By default in Elasticsearch every index is allocated with 5 primary shards and single replica of each shard. That means for every index there will be 5 primary shards and replication of each will result in total of 10 shards per index.

Image result for shards and replicas in elasticsearch cluster

Types Of Tuning in ElasticSearch:- 

Index Performance Tuning

  • Use Bulk Requests
  • Use multiple workers/threads to send data to Elasticsearch
  • Unset or increase the refresh interval
  • Disable refresh and replicas for initial loads
  • Disable swapping
  • Give memory to the file-system cache
  • Use faster hardware
  • Indexing buffer size ( Increase the size of the indexing buffer – JAVA Heap Size )

Search Performance Tuning

  • Give memory to the file-system cache
  • Use faster hardware
  • Document modeling (documents should not be joined)
  • Search as few fields as possible
  • Pre-index data (give values to your search)
  • Shards might help with throughput, but not always
  • Use index sorting to speed up conjunctions
  • Warm up the file-system cache (index.store.preload)

Why Is ElasticSearch Tuning Required?

Elasticsearch gives you moderate performance for search and injection of logs maintaining a balance. But when the service utilization or service count within the infrastructure grows, logs grow in similar proportion. One could easily scale the cluster vertically, but that would increase the cost.
Instead, you can tune the cluster as per the requirement(Search or Injection) while maintaining the cost constrains.

Tune-up

How to handle 20k logs ingestion per sec?
For such high data volume ingestion into elastic search cluster, you would be somehow compromising the search performance.Starting step is to choose the right compute system for the requirement, prefer high compute for memory over CPU. We are using m5.2xlarge(8 CPU/32 GB) as data nodes and t3.medium (2 CPU/ 4 GB) for master.

Elasticsearch Cluster Size
Master – 3 (HA – To avoid the split-brain problem) or 1 (NON-HA)
Data Node – 2

Configure JVM
The optimal or minimal configuration for JVM heap size for the cluster is 50% of the memory of the server.
File: jvm.option
Path: /etc/elasticsearch/

             - Xms16g
             - Xmx16g

Update system file size and descriptors

             - ES_HEAP_SIZE=16g
             - MAX_OPEN_FILES=99999
             - MAX_LOCKED_MEMORY=unlimited


Dynamic APIs for tuning index performance
With respect to the index tuning performance parameters, below mentioned are the dynamic APIs (Zero downtime configuration update) for tuning the parameters.

Updating Translog
Translog is included in every shard which maintains the persistence of every log by recording every non-committed index operation.
Changes that happen after one commit and before another will be lost in the event of process exit or hardware failure.
To prevent this data loss, each shard has a transaction log or write-ahead log associated with it. Any index or delete operation is written to the translog after being processed by the internal Lucene index.

async – In the event of hardware failure, all acknowledged writes since the last automatic commit will be discarded.

Setting translog to async will increase the index write performance, but do not guarantee data recovery in case of hardware failure. 

 curl -H "Content-Type: application/json" -XPUT "localhost:9200/_all/_settings?timeout=180s" -d ' 
{

"index.translog.durability" : "async"
}'

Timeout
Adjust the time period of operation with respect to the number of indexes. Larger number of indexes, higher would be the timeout value.

Number of Replicas to minimal

In case there is a requirement of ingestion of data in large amount (same scenario as we have), we should set the replica to ‘0‘. This is risky as loss of any shard will cause a data loss as no replica set exist for it. But also at the same time index performance will significantly increase as the document has to be index just once, without replica.
After you are done with the load ingestion, you can revert back the same setting.

curl -H "Content-Type: application/json" -XPUT "localhost:9200/_all/_settings?timeout=180s" -d ' 
{                        
"number_of_replicas": 0 
}'

Increase the Refresh Interval
Making the indexes available for search is the operation called as refresh, and that’s a costly operation in terms of resources. Calling it too frequently can compromise the index write performance.

The Default settings for elasticsearch is to refresh the indexes every second for which the the search request is consecutive in the last 30 seconds.
This is the most appropriate configuration if you have no or very little search traffic and want to optimize for indexing speed.

In case, if your index participate in frequent search requests, in this scenario Elasticsearch will refresh the index every second. If you can bear the expense to increase the amount of time between when a document gets indexed and when it becomes visible, increasing the index.refresh_interval to a grater value, e.g. 300s(5 mins), might help improve indexing performance.

curl -H "Content-Type: application/json" -XPUT 'localhost:9200/_all/_settings?timeout=180s' -d
'{"index" : 
                 {    "refresh_interval" : "300s"    }              
}'

Decreasing number of shards
Changing the number of shards can be achieved by _shrink and _split APIs. As the name suggests, to increase the number of shards split can be used and shrink for decrease.
By default in Elasticsearch every index is allocated with 5 primary shards and single replica of each shard. That means for every index there will be 5 primary shards and replication of each will result in total of 10 shards per index.

curl -H "Content-Type: application/json" -XPUT "localhost:9200/_all/_settings?timeout=180s" -d '
 { 
     "number_of_shards": 1
 }'

When Logstash is Input
Logstash provides the following configurable options for tuning pipeline performance:

  • Pipeline Workers
  • Pipeline Batch Size
  • Pipeline Batch Delay 
  • Profiling the Heap

Configuring the above parameters would help in increasing the injection rate (index performance), as the above parameters work for feeding in elasticsearch.

Summary

ElasticSearch tuning is very complex and critical task as it can give some serious damage to your cluster or break down the whole. So be careful while modifying any parameters on production environment.

ElasticSearch tuning can be extensively used to add values to the logging system, also meeting the cost constrains.

Happy Searching!

References: https://www.elastic.co
Image References: https://docs.bonsai.io/article/122-shard-primer  https://innerlives.org/2018/10/15/image-magic-drawing-the-history-of-sorcery-ritual-and-witchcraft/

The Concept Of Data At Rest Encryption In MySql

Word “data” is very crucial since early 2000 and within a span of these 2 decades is it becoming more crucial. According to Forbes Google believe that in future every organisation will lead to becoming a data company. Well, when it comes to data, security is one of the major concerns that we have to face. 

We have several common techniques to store data in today’s environment like MySql, Oracle, MsSql, Cassandra, Mongo etc and these techs will keep on changing in future. But according to DataAnyz, MySql Still has a 33% share of the market. So here we are with a technique to secure our MySQL data.

Before getting more into this article, let us know what are possible combined approaches to secure MySQL data 

  1. Mysql Server hardening
  2. Mysql Application-level hardening
  3. Mysql data encryption at transit
  4. Mysql data at rest encryption
  5. Mysql Disk Encryption

You may explore all the approaches but in this article, we will understand the concept of Mysql data at encryption and hands-on too.

The concept of  “Data at Rest Encryption”  in MySQL was introduced in Mysql 5.7 with the initial support of InnoDB storage engine only and with the period it has evolved significantly. So let’s understand about “Data at Rest Encryption” in MySQL 

What is “Data at Rest Encryption”  in MySql?

The concept of  “data at rest encryption” uses two-tier encryption key architecture, which used below two keys 

  1. Tablespace keys: This is an encrypted key which is stored  in the tablespace header 
  2. Master Key: the Master key is used to decrypt the tablespace keys

So let’s Understand its working

Let’s say we have a running MySQL with InnoDB storage engine and tablespace is encrypted using a key, referred as table space key. This key is then encrypted using a master key and stored in the tablespace header 

Now when a request is made to access MySQL data, InnoDB use master key to decrypt tablespace key present tablespace header. After getting decrypted tablespace key, the tablespace is decrypted and make is available to perform read/write operations

Note: The decrypted version of a tablespace key never changes, but the master key can be rotated.

Data at rest encryption implemented using keyring file plugin to manage and encrypt the master key

After understanding the concept of encryption and decryption below are few Pros and Cons for using  DRE

Pros:

  • A strong Encryption of AES 256 is used to encrypt the InnoDB tables
  • It is transparent to all applications as we don’t need any application code, schema, or data type changes
  • Key management is not done by DBA.
  • Keys can be securely stored away from the data and key rotation is very simple.

Cons:

  • Encrypts only  InnoDB tables
  • Can’t encrypt  binary logs, redo logs, relay logs on unencrypted slaves, slow log, error log, general log, and audit log

Though we can’t encrypt binary logs, redo logs, relay logs on Mysql 5.7 but MariaDB has implemented this with a mechanism to encrypt undo/redo logs, binary logs/relay logs, etc.  by enabling few flags in MariaDB Config File

innodb_sys_tablespace_encrypt=ON
innodb_temp_tablespace_encrypt=ON
innodb_parallel_dblwr_encrypt=ON
innodb_encrypt_online_alter_logs=ON
innodb_encrypt_tables=FORCE
encrypt_binlog=ON
encrypt_tmp_files=ON

However, there are some limitations 

Let’s Discuss its problem/solutions and few solutions to them

  1. Running MySQL on a host will have access from root user and the MySQL user and both of them may access key file(keyring file) present on the same system. For this problem, we may have our keys on mount/unmount drive which can be unmounted after restarting MySQL.
  2. Data will not be in encrypted form when it will get loaded onto the RAM and can be dumped and read
  3. If MySQL is restarted with skip-grant-tables then again it’s havoc but this can be eliminated using an unmounted drive for keyring
  4.  As tablespace key remains the same so our security relies on Master key rotation which can be used  to save our master key 

NOTE: Do not to lose the master key file, as we cant decrypt data and will suffer data loss

Doing Is Learning, so let’s try 

As a prerequisite, we need a machine with MySQL server up and running Now for data at rest encryption to work we need to enable 

Enable file per table on with the help of the configuration file.  

 
[root@mysql ~]#  vim /etc/my.cnf
[mysqld]

innodb_file_per_table=ON

Along with the above parameter, enable keyring plugin and keyring path. This parameter should always be on the top in configuration so that it will get load initially when MySQL starts up. Keyring plugin is already installed in MySQL server we just need to enable it. 

[root@mysql ~]#  vim /etc/my.cnf
[mysqld]
early-plugin-load=keyring_file.so
keyring_file_data=/var/lib/mysql/keyring-data/keyring
innodb_file_per_table=ON

And save the file with a restart to MySQL

[root@mysql ~]#  systemctl restart mysql

We can check for the enabled plugin and verify our configuration.

mysql> SELECT plugin_name, plugin_status FROM INFORMATION_SCHEMA.PLUGINS WHERE plugin_name LIKE 'keyring%';
+--------------+---------------+
| plugin_name  | plugin_status |
+--------------+---------------+
| keyring_file | ACTIVE        |
+--------------+---------------+
1 rows in set (0.00 sec)


verify that we have a running keyring plugin and its location

mysql>  show global variables like '%keyring%';
+--------------------+-------------------------------------+
| Variable_name      | Value                 |
+--------------------+-------------------------------------+
| keyring_file_data  | /var/lib/mysql/keyring-data/keyring |
| keyring_operations | ON                                  |
+--------------------+-------------------------------------+
2 rows in set (0.00 sec)

Verify that we have enabled file per table 

MariaDB [(none)]> show global variables like 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON   |
+-----------------------+-------+
1 row in set (0.33 sec)

Now we will test our set up by creating a test DB with a table and insert some value to the table using below commands 

mysql> CREATE DATABASE test_db;
mysql> CREATE TABLE test_db.test_db_table (id int primary key auto_increment, payload varchar(256)) engine=innodb;
mysql> INSERT INTO test_db.test_db_table(payload) VALUES('Confidential Data');

After successful test data creation, run below command from the Linux shell to check whether you’re able to read InnoDB file for your created table i.e. Before encryption

Along with that, we see that our keyring file is also empty before encryption is enabled

[root@mysql ~]#  strings /var/lib/mysql/test_db/test_db_table.ibd
infimum
supremum
Confidential DATA

 

At this point of time if we try to check our keyring file we will not find anything

[root@mysql ~]#  cat /var/lib/mysql/keyring
[root@mysql ~]# 

Now let’s encrypt our table with below command and check our InnoDB file and keyring file content.

mysql> ALTER TABLE test_db.test_db_table encryption='Y';
[root@mysql ~] strings /var/lib/mysql/test_db/test_db_table.ibd
0094ca6d-7ba9-11e9-b0d0-0800275716d42QMw

The above content clear that file data is not readable and table space is encrypted. As previously oy keyring file data was absent/empty, so now it must be having some data.

 Note: Please look  master Key and time stamp(we will implement key rotation )

[root@mysql ~]  cat /var/lib/mysql/keyring-data/keyring
Keyring file version:1.0?0 INNODBKey-0094ca6d-7ba9-11e9-b0d0-0800275716d4-2AES???_gd?7m>0??nz??8M??7Yʹ:ll8@?0 INNODBKey-0094ca6d-7ba9-11e9-b0d0-0800275716d4-1AES}??x?$F?z??$???:??k?6y?YEOF
[root@mysql ~] ls -ltr /var/lib/mysql/keyring-data/keyring
-rw-r----- 1 mysql mysql 283 Sep 18 16:48 /var/lib/mysql/keyring-data/keyring

With known security concern for the compromised master key, we may use the master key rotation technique from time to time to save our key.

mysql> alter instance rotate innodb master key;
Query OK, 0 rows affected (0.00 sec)

After this command, we realise that our key timestamp is changed now and we have a new key. 

[root@mysql ~] ls -ltr /var/lib/mysql/keyring-data/keyring
-rw-r----- 1 mysql mysql 411 Sep 18 18:17 /var/lib/mysql/keyring-data/keyring

Some Useful Commands

Below are some helpful commands we may use in an encrypted system 

1. List All the tables with encryption enabled 

mysql> SELECT * FROM information_schema.tables WHERE create_options LIKE '%ENCRYPTION="Y"%' \G;
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: sample_db
TABLE_NAME: test_db_table
TABLE_TYPE: BASE TABLE
ENGINE: InnoDB
VERSION: 10
ROW_FORMAT: Dynamic
TABLE_ROWS: 0
AVG_ROW_LENGTH: 0
DATA_LENGTH: 16384
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 0
DATA_FREE: 0
AUTO_INCREMENT: 2
CREATE_TIME: 2019-09-18 16:46:34
UPDATE_TIME: 2019-09-18 16:46:34
CHECK_TIME: NULL
TABLE_COLLATION: latin1_swedish_ci
CHECKSUM: NULL
CREATE_OPTIONS: ENCRYPTION="Y"
TABLE_COMMENT: 
1 row in set (0.02 sec)

ERROR: 
No query specified

2. Encrypt Tables in a Database 

mysql> ALTER TABLE db.t1 ENCRYPTION='Y';

3. Disable encryption for an InnoDB table

mysql> ALTER TABLE t1 ENCRYPTION='N';

Conclusion : 

You can encrypt data at rest by using keyring plugin and we can control and manage it by master key rotation. Creating an encrypted Mysql data file setup is as simple as firing a few simple commands. Using an encrypted system is also transparent to services, applications, and users with minimal impact of system resources. Further with Encryption of data at rest, we may also implement encryption in transit. 

I hope you found this article informative and interesting. I’d really appreciate any and all feedback.

Achieve SSO in Privately Hosted Jenkins

Introduction

Providing OAuth 2.0 user authentication directly or using Google+ Sign-in reduces your CI overhead. It also provides a trusted and secure login system that’s familiar to users, consistent across devices, and removes the burden of users having to remember another username and password. One of the hurdles in implementing a Gmail authentication is that Google developer console and your  Jenkins server should be in the same network or in simple terms they can talk to each other.

Resources Used

  • Privately Hosted Jenkins
  • Google developer console
  • Ngrok
In this blog, I’m trying to explain how to integrate Gmail authentication feature in your privately hosted Jenkins server so that you get free of filling the form by the time of creating a new user.

Setup 1: Setup Ngrok

NGROK
 
Ngrok is multiplatform tunneling, reverse proxy software that establishes secure tunnels from a public endpoint such as the internet to a locally running network service while capturing all traffic for detailed inspection and replay.
We are using Ngrok to host our Jenkins service (running on port 8080) to public IP.

 
Go to google and search for Download Ngrok.
 
 
 
Either Login with google account or do Ngrok own signup.
 
 
After Logged in Ngrok Download it.
 
 
After Download Ngrok, Go to the console and unzip the downloaded zip file and then move it to /usr/local/bin.
Note: Moving part is optional, we do so for accessing ngrok from anywhere.
 
 
 
Go to ngrok UI page , copy the authentication key and paste it.
Note: Remove ” . / ” sign because we moved ngrok file to /usr/local/bin
 
 
 Major configuration for Ngrok is done. Now type the command:
ngrok http 8080
 Assuming that Jenkins is running on port 8080.
 
 
Now Ngrok Host our Jenkins Service to public IP.
 
Copy this IP, we will use it in the google developer console.
 
Note: Make this terminal up and running.(don’t do ctrl+c)

Step 2: Setup Google Developer Console

Go to google and search for google developer console.
 
 
After sign in into google developer console, we will redirect to Google developer console UI screen.
Go to Select a project  → New Project
 
 
Give Project Name, here I will use “JenkinsGmailAuthentication” and create a project. Creating a project takes 1 or 2 minutes.
 
 
After Project created, we will be redirected to the UI page as shown below. Now click on on the “Credentials” Tab on the left slide bar.
 
 
 
After Go to the OAuth consent screen tab and give the below entries. Here I will give Application name to “JenkinsGmailAuthentication”.
 
 
The important part of the Google developer console is Public IP we created using Ngrok. Copy Public IP in Authorized domains and note to remove ” http:// ” in Authorized domains.
 
 
After Setting OAuth consent screen, Go to   “Credentials Tab”→ Create Credentials→OAuthClientID
 
 
Select Application type as Web Application, give the name “JenkinsGmailAuthentication”.
Major Part of Create Credential has Authorized JavaScript origins and Authorized redirect URIs.
 
 
Copy Client ID and Client Secret because we are going to use these in Jenkins.
 

Step 3: Setup Jenkins

I am assuming that Jenkins is already installed in your system.
Go to Manage Jenkins → Manage Plugins→ Available
 
 
Search for “Google Login Plugin” and add it.
 
 
Go to Manage Jenkins → Configure Global Security
 
 
The major part of Jenkins Setup is to Configure Global Security.
Check the Enable security → Login with Google and Paste the Client ID and Client secret generated in Create Credential Step and Save.
 
 
Up to here, we are done with the Setup part.
Now Click on login button on Jenkins UI, you will redirect to Gmail for login.
 
 
Select the account from which you want to log in.
 
 
After selecting Account you will redirect to Jenkins and you are logged in as selected user.
 
 
You may be facing a problem when you log in again.
Logout from the current user and login again.
 
 
After redirected to Gmail select another user.
 
 
After selecting user you will be redirected to Error Page showing: HTTP ERROR 404.
 
 
Don’t worry, you have to just remove “securityRealm/” or enter again “localhost:8080”.
 
 
You are logged in with the selected user.
 
 
So now you know how to do Gmail Authentication between Google developer console and Jenkins when they are not directly reachable to each other.
Here the main bridge between both is Ngrok which host our Privately hosted Jenkins to outer internet.
 
 
 

Unix File Tree Part-2

For those who have surfed straight to this blog, please check out the previous part of this series Unix File Tree Part-1 and those who have stayed tuned for this part, welcome back.In the previous part, we discussed the philosophy and the need for file tree. In this part, we will dive deep into the significance of each directory.

Image result for horizontal file tree linux

Dayum!! that’s a lot of stuff to gulp at once, we’ll kick out things one after the other.

Major directories

Let’s talk about the crucial directories which play a major role.

  • /bin: When we started crawling on Linux this helped us to get on our feet yes, you read it right whether you want to copy any file, move it somewhere, create a directory, find out date, size of a file, all sorts of basic operations without which the OS won’t even listen to you (Linux yawning meanwhile) happens because of the executables present in this directory. Most of the programs in /bin are in binary format, having been created by a C compiler, but some are shell scripts in modern systems.
  • /etc: When you want things to behave the way you want, you go to /etc and put all your desired configuration there (Imagine if your girlfriend has an /etc life would have been easier). whether it is about various services or daemons running on your OS it will make sure things are working the way you want them to.
  • /var: He is the guy who has kept an eye over everything since the time you have booted the system (consider him like Heimdall from Thor). It contains files to which the system writes data during the course of its operation. Among the various sub-directories within /var are /var/cache (contains cached data from application programs), /var/games(contains variable data relating to games in /usr), /var/lib (contains dynamic data libraries and files), /var/lock (contains lock files created by programs to indicate that they are using a particular file or device), /var/log (contains log files), /var/run (contains PIDs and other system information that is valid until the system is booted again) and /var/spool (contains mail, news and printer queues).
  • /proc: You can think of /proc just like thoughts in your brain which are illusions and virtual. Being an illusionary file system it does not exist on disk instead, the kernel creates it in memory. It is used to provide information about the system (originally about processes, hence the name). If you navigate to /proc The first thing that you will notice is that there are some familiar-sounding files, and then a whole bunch of numbered directories. The numbered directories represent processes, better known as PIDs, and within them, a command that occupies them. The files contain system information such as memory (meminfo), CPU information (cpuinfo), and available filesystems.
  • /opt: It is like a guest room in your house where the guest stayed for prolong period and became part of your home. This directory is reserved for all the software and add-on packages that are not part of the default installation.
  • /usr: In the original Unix implementations, /usr was where the home directories of the users were placed (that is to say, /usr/someone was then the directory now known as /home/someone). In current Unixes, /usr is where user-land programs and data (as opposed to ‘system land’ programs and data) are. The name hasn’t changed, but its meaning has narrowed and lengthened from “everything user related” to “user usable programs and data”. As such, some people may now refer to this directory as meaning ‘User System Resources’ and not ‘user’ as was originally intended.

Potato or Potaaato what is the difference? 

We’ll be discussing those directories which confuse us always, which have almost a similar purpose but still are in separate locations and when asked about them we go like ummmm…….

/bin vs /usr/bin vs /sbin vs /usr/local/bin

This might get almost clear out when I explained the significance of /usr in the above paragraph. Since Unix designers planned /usr to be the local directories of individual users so it contained all of the sub-directories like /usr/bin, /usr/sbin, /usr/local/bin. But the question remains the same how the content is different?

/usr/bin:

  • /usr/bin is a standard directory on Unix-like operating systems that contains most of the executable files that are not needed for booting or repairing the system. 
  • A few of the most commonly used are awk, clear, diff, du, env, file, find, free, gzip, less, locate, man, sudo, tail, telnet, time, top, vim, wc, which, and zip.

/usr/sbin:

  • The /usr/sbin directory contains non-vital system utilities that are used after booting.
  • This is in contrast to the /sbin directory, whose contents include vital system utilities that are necessary before the /usr directory has been mounted (i.e., attached logically to the main filesystem). 
  • A few of the more familiar programs in /usr/sbin are adduser, chroot, groupadd, and userdel. 
  • It also contains some daemons, which are programs that run silently in the background, rather than under the direct control of a user, waiting until they are activated by a particular event or condition such as crond and sshd.

I hope I have covered most of the directories which you might come across frequently and your questions must have been answered.
Now that we know about the significance of each UNIX directory, It’s time to use them wisely the way they are supposed to be.
Please feel free to reach me out for any suggestions.
Goodbye till next time!

References: https://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/usr.htmlhttps://askubuntu.com/questions/130186/what-is-the-rationale-for-the-usr-directoryhttps://askubuntu.com/questions/308045/differences-between-bin-sbin-usr-bin-usr-sbin-usr-local-bin-usr-localhttp://index-of.es/Varios-2/How%20Linux%20Works%20What%20Every%20Superuser%20Should%20Know.pdf
https://imgflip.com/memegenerator

Jenkins Pipeline Global Shared Libraries

Although, the coding language used here is groovy but Jenkins does not allow us to use Groovy to its fullest,  so we can say that Jenkins Pipelines are not exactly groovy. Classes that you may write in src, they are processed in a “special Jenkins way” and you have no control over this. Depending on the various scenarios objects in Groovy don’t behave as you would expect objects to work.

Our thought is putting all pipeline functions in vars is much more practical approach, while there is no other good way to do inheritance, we wanted to use Jenkins Pipelines the right way but it has turned out to be far more practical to use vars for global functions.

Practical Strategy
As we know Jenkins Pipeline’s shared library support allows us to define and develop a set of shared pipeline helpers in this repository and provides a straightforward way of using those functions in a Jenkinsfile.This simple example will just illustrate how you can provide input to a pipeline with a simple YAML file so you can centralize all of your pipelines into one library. The Jenkins shared library example:And the example app that uses it:

Directory Structure

You would have the following folder structure in a git repo:

└── vars
    ├── opstreePipeline.groovy
    ├── opstreeStatefulPipeline.groovy
    ├── opstreeStubsPipeline.groovy
    └── pipelineConfig.groovy

Setting up Library in Jenkins Console.

This repo would be configured in under Manage Jenkins > Configure System in the Global Pipeline Libraries section. In that section Jenkins requires you give this library a Name. Example opstree-library

Pipeline.yaml

Let’s assume that project repository would have a pipeline.yaml file in the project root that would provide input to the pipeline:Pipeline.yaml

ENVIRONMENT_NAME: test
SERVICE_NAME: opstree-service
DB_PORT: 3079
REDIS_PORT: 6079

Jenkinsfile

Then, to utilize the shared pipeline library, the Jenkinsfile in the root of the project repo would look like:

@Library ('opstree-library@master') _
opstreePipeline()

PipelineConfig.groovy

So how does it all work? First, the following function is called to get all of the configuration data from the pipeline.yaml file:

def call() {
  Map pipelineConfig = readYaml(file: "${WORKSPACE}/pipeline.yaml")
  return pipelineConfig
}

opstreePipeline.groovy

You can see the call to this function in opstreePipeline(), which is called by the Jenkinsfile.

def call() {
    node('Slave1') {

        stage('Checkout') {
            checkout scm
        }

         def p = pipelineConfig()

        stage('Prerequistes'){
            serviceName = sh (
                    script: "echo ${p.SERVICE_NAME}|cut -d '-' -f 1",
                    returnStdout: true
                ).trim()
        }

        stage('Build & Test') {
                sh "mvn --version"
                sh "mvn -Ddb_port=${p.DB_PORT} -Dredis_port=${p.REDIS_PORT} clean install"
        }

        stage ('Push Docker Image') {
            docker.withRegistry('https://registry-opstree.com', 'dockerhub') {
                sh "docker build -t opstree/${p.SERVICE_NAME}:${BUILD_NUMBER} ."
                sh "docker push opstree/${p.SERVICE_NAME}:${BUILD_NUMBER}"
            }
        }

        stage ('Deploy') {
            echo "We are going to deploy ${p.SERVICE_NAME}"
            sh "kubectl set image deployment/${p.SERVICE_NAME} ${p.SERVICE_NAME}=opstree/${p.SERVICE_NAME}:${BUILD_NUMBER} "
            sh "kubectl rollout status deployment/${p.SERVICE_NAME} -n ${p.ENVIRONMENT_NAME} "

    }
}

You can see the logic easily here. The pipeline is checking if the developer wants to deploy on which environment what db_port needs to be there.

Benefits

The benefits of this approach are many, some of them are as mentioned below:

  • How to write groovy code is now none of the developer’s perspective.
  • Structure of the Pipeline.yaml is really flexible, where entire data structures can be passed as input to the pipeline.
  • Code redundancy saved to a large extent.

 Jenkinsfiles could actually just look more commonly, like this:

@Library ('opstree-library@master') _
opstreePipeline()

and opstreePipeline() would just read the the project type from pipeline.yaml and dynamically run the exact function, like opstreeStatefulPipeline(), opstreeStubsPipeline.groovy() . since pipeline are not exactly groovy, this isn’t possible. So one of the drawback is that each project would have to have a different-looking Jenkinsfile. The solution is in progress!So, what do you think?

Reference links: 
Image: Google image search (jenkins.io)

AWS RDS cross account snapshot restoration

Many a times you may have faced problem where your production infra is on different AWS account and non prod on different account and you are required to restore the RDS snapshot to non prod account for testing.

Recently I got a task to restore my prod account RDS snapshot to a different account for testing purpose. It was a very interesting and new task for me. and I was in an awe, how AWS thinks about what all challenges we may face in real life and provides a solution to it.

For those who are not aware about RDS, I can brief RDS as a relational database service by Amazon Web Services (AWS), it is a managed service so we don’t have to worry about the underlying Operating System and Database software installation, we just have to use it.

Amazon RDS creates a storage volume snapshot of your DB instance backing up the entire DB instance and not just individual database. As I told you, we have to copy and restore an RDS snapshot to a different aws account. There is a catch!, you can directly copy an aws snapshot to a different region in same aws account, but to copy to a different aws account you need to share the snapshot to aws account and then restore from there, so lets begin.

To share an automated DB snapshot, create a manual DB snapshot by copying the automated snapshot, and then share that copy.

Step 1: Find the snapshot that you want to copy, and select it by clicking the checkbox next to it’s name. You can select a “Manual” snapshot, or one of the “Automatic” snapshots that are prefixed by “rds:”.

Step 2: From the “Snapshot Actions” menu, select “Copy Snapshot”.

Step 3: On the page that appears: Select the target region. In this case, since we have to share this snapshot with another aws account we can select existing region.

  • Specify your new snapshot name in the “New DB Snapshot Identifier” field. This identifier must not already be used by a snapshot in the target region.
  • Check the “Copy Tags” checkbox if you want the tags on the source snapshot to be copied to the new snapshot.
  • Under “Encryption”, leave “Disable Encryption” selected.
  • Click the “Copy Snapshot” button.

Step 4: Once you click on “Copy Snapshot”, you can see the snapshot being created.

Step 5: Once the manual snapshot is created, select the created snapshot, and from the “Snapshot Actions” menu, select “Share Snapshot”.

Step 6: Define the “DB snapshot visibility” as private and add the “AWS account ID” to which we want to share the snapshot and click on save.

Till this point we have shared our db snapshot to the aws account where we need to restore the db.
Now login to the other aws account and go to RDS console and check for snapshot that was shared just recently.

Step 7: Select the snapshot and from the “Snapshot Actions” menu select “Restore Snapshot”.

Step 8: From here we just need to restore the db as we do normally. Fill out the required details like “DB Instance class”, “Multi-AZ-Deployment”, “Storage Type”, “VPC ID”, “Subnet group”, “Availability Zone”, “Database Port”, “DB parameter group”, as per the need and requirement.

Step 9: Finally click on “Restore DB instance” and voila !!, you are done.

Step 10: You can see the db creation in process. Finally, you have restored the DB to a different AWS account !!

Conclusion:

So there you go. Everything you need to know to restore a production AWS RDS into a different AWS account. That’s cool !! Isn’t it ?, but I haven’t covered everything. There is a lot more to explore. We will walk through RDS best practices in our next blog, till then keep exploring our other tech blogs !!.

Image source: https://unsplash.com/photos/lRoX0shwjUQ