Tuning Of ElasticSearch Cluster

Related image

Store, Search And Analyse!

Scenario

The first thing which comes in mind when I hear about logging solutions in my infrastructure is ELK (Elasticsearch, Logstash, Kibana).
But, what happens when logs face an upsurge in the quantity and hamper performance, which, in Elasticsearch words, we may also call “A Fall Back”
We need to get control of situation, and optimize our setup. For which, we require a need for tuning the Elasticsearch

What Is ElasticSearch?

It is a java based, open-source project build over Apache Lucene and released under the license of Apache. It has the ability to store, search and analyse document files in diverse format.

A Bit Of History

Image result for history drawing

Shay Banon was the founder of compass project, thought of need to create a scalable search engine which could support other languages than java.
Therefore, he started to build a whole new project which was the 3rd version of compass using JSON and HTTP interface. The first version of which was released in 2010.

ElasticSearch Cluster

Elasticsearch is a java based project which runs on Java Virtual Machines, wherein each JVM server is considered to be an elasticsearch node. In order to support scalability, elasticsearch holds up the concept of cluster in which multiple nodes runs on one or more host machines which can be grouped together into a cluster which has a unique name.
These clustered nodes holds up the entire data in the form of documents and provides the functionality of indexing and search of those documents.

Types Of Nodes:-

  • Master Eligible-Node
    Masters are meant for cluster/admin operations like allocation, state maintenance, index/alias creation, etc
  • Data Node
    Data nodes hold data and perform data-related operations such as CRUD, search, and aggregations.
  • Ingest Node
    Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing.
  • Tribe Node
    It is a special type of coordinating node that can connect to multiple clusters and perform search and other operations across all connected clusters.
Image result for nodes in elasticsearch cluster

Shards and Replicas

  • Shards: Further dividing index into multiple entities are called shards
  • Replicas: Making one or more copies of the index’s shards called as replica shards or simple replicas

By default in Elasticsearch every index is allocated with 5 primary shards and single replica of each shard. That means for every index there will be 5 primary shards and replication of each will result in total of 10 shards per index.

Image result for shards and replicas in elasticsearch cluster

Types Of Tuning in ElasticSearch:- 

Index Performance Tuning

  • Use Bulk Requests
  • Use multiple workers/threads to send data to Elasticsearch
  • Unset or increase the refresh interval
  • Disable refresh and replicas for initial loads
  • Disable swapping
  • Give memory to the file-system cache
  • Use faster hardware
  • Indexing buffer size ( Increase the size of the indexing buffer – JAVA Heap Size )

Search Performance Tuning

  • Give memory to the file-system cache
  • Use faster hardware
  • Document modeling (documents should not be joined)
  • Search as few fields as possible
  • Pre-index data (give values to your search)
  • Shards might help with throughput, but not always
  • Use index sorting to speed up conjunctions
  • Warm up the file-system cache (index.store.preload)

Why Is ElasticSearch Tuning Required?

Elasticsearch gives you moderate performance for search and injection of logs maintaining a balance. But when the service utilization or service count within the infrastructure grows, logs grow in similar proportion. One could easily scale the cluster vertically, but that would increase the cost.
Instead, you can tune the cluster as per the requirement(Search or Injection) while maintaining the cost constrains.

Tune-up

How to handle 20k logs ingestion per sec?
For such high data volume ingestion into elastic search cluster, you would be somehow compromising the search performance.Starting step is to choose the right compute system for the requirement, prefer high compute for memory over CPU. We are using m5.2xlarge(8 CPU/32 GB) as data nodes and t3.medium (2 CPU/ 4 GB) for master.

Elasticsearch Cluster Size
Master – 3 (HA – To avoid the split-brain problem) or 1 (NON-HA)
Data Node – 2

Configure JVM
The optimal or minimal configuration for JVM heap size for the cluster is 50% of the memory of the server.
File: jvm.option
Path: /etc/elasticsearch/

             - Xms16g
             - Xmx16g

Update system file size and descriptors

             - ES_HEAP_SIZE=16g
             - MAX_OPEN_FILES=99999
             - MAX_LOCKED_MEMORY=unlimited


Dynamic APIs for tuning index performance
With respect to the index tuning performance parameters, below mentioned are the dynamic APIs (Zero downtime configuration update) for tuning the parameters.

Updating Translog
Translog is included in every shard which maintains the persistence of every log by recording every non-committed index operation.
Changes that happen after one commit and before another will be lost in the event of process exit or hardware failure.
To prevent this data loss, each shard has a transaction log or write-ahead log associated with it. Any index or delete operation is written to the translog after being processed by the internal Lucene index.

async – In the event of hardware failure, all acknowledged writes since the last automatic commit will be discarded.

Setting translog to async will increase the index write performance, but do not guarantee data recovery in case of hardware failure. 

 curl -H "Content-Type: application/json" -XPUT "localhost:9200/_all/_settings?timeout=180s" -d ' 
{

"index.translog.durability" : "async"
}'

Timeout
Adjust the time period of operation with respect to the number of indexes. Larger number of indexes, higher would be the timeout value.

Number of Replicas to minimal

In case there is a requirement of ingestion of data in large amount (same scenario as we have), we should set the replica to ‘0‘. This is risky as loss of any shard will cause a data loss as no replica set exist for it. But also at the same time index performance will significantly increase as the document has to be index just once, without replica.
After you are done with the load ingestion, you can revert back the same setting.

curl -H "Content-Type: application/json" -XPUT "localhost:9200/_all/_settings?timeout=180s" -d ' 
{                        
"number_of_replicas": 0 
}'

Increase the Refresh Interval
Making the indexes available for search is the operation called as refresh, and that’s a costly operation in terms of resources. Calling it too frequently can compromise the index write performance.

The Default settings for elasticsearch is to refresh the indexes every second for which the the search request is consecutive in the last 30 seconds.
This is the most appropriate configuration if you have no or very little search traffic and want to optimize for indexing speed.

In case, if your index participate in frequent search requests, in this scenario Elasticsearch will refresh the index every second. If you can bear the expense to increase the amount of time between when a document gets indexed and when it becomes visible, increasing the index.refresh_interval to a grater value, e.g. 300s(5 mins), might help improve indexing performance.

curl -H "Content-Type: application/json" -XPUT 'localhost:9200/_all/_settings?timeout=180s' -d
'{"index" : 
                 {    "refresh_interval" : "300s"    }              
}'

Decreasing number of shards
Changing the number of shards can be achieved by _shrink and _split APIs. As the name suggests, to increase the number of shards split can be used and shrink for decrease.
By default in Elasticsearch every index is allocated with 5 primary shards and single replica of each shard. That means for every index there will be 5 primary shards and replication of each will result in total of 10 shards per index.

curl -H "Content-Type: application/json" -XPUT "localhost:9200/_all/_settings?timeout=180s" -d '
 { 
     "number_of_shards": 1
 }'

When Logstash is Input
Logstash provides the following configurable options for tuning pipeline performance:

  • Pipeline Workers
  • Pipeline Batch Size
  • Pipeline Batch Delay 
  • Profiling the Heap

Configuring the above parameters would help in increasing the injection rate (index performance), as the above parameters work for feeding in elasticsearch.

Summary

ElasticSearch tuning is very complex and critical task as it can give some serious damage to your cluster or break down the whole. So be careful while modifying any parameters on production environment.

ElasticSearch tuning can be extensively used to add values to the logging system, also meeting the cost constrains.

Happy Searching!

References: https://www.elastic.co
Image References: https://docs.bonsai.io/article/122-shard-primer  https://innerlives.org/2018/10/15/image-magic-drawing-the-history-of-sorcery-ritual-and-witchcraft/

The Concept Of Data At Rest Encryption In MySql

Word “data” is very crucial since early 2000 and within a span of these 2 decades is it becoming more crucial. According to Forbes Google believe that in future every organisation will lead to becoming a data company. Well, when it comes to data, security is one of the major concerns that we have to face. 

We have several common techniques to store data in today’s environment like MySql, Oracle, MsSql, Cassandra, Mongo etc and these techs will keep on changing in future. But according to DataAnyz, MySql Still has a 33% share of the market. So here we are with a technique to secure our MySQL data.

Before getting more into this article, let us know what are possible combined approaches to secure MySQL data 

  1. Mysql Server hardening
  2. Mysql Application-level hardening
  3. Mysql data encryption at transit
  4. Mysql data at rest encryption
  5. Mysql Disk Encryption

You may explore all the approaches but in this article, we will understand the concept of Mysql data at encryption and hands-on too.

The concept of  “Data at Rest Encryption”  in MySQL was introduced in Mysql 5.7 with the initial support of InnoDB storage engine only and with the period it has evolved significantly. So let’s understand about “Data at Rest Encryption” in MySQL 

What is “Data at Rest Encryption”  in MySql?

The concept of  “data at rest encryption” uses two-tier encryption key architecture, which used below two keys 

  1. Tablespace keys: This is an encrypted key which is stored  in the tablespace header 
  2. Master Key: the Master key is used to decrypt the tablespace keys

So let’s Understand its working

Let’s say we have a running MySQL with InnoDB storage engine and tablespace is encrypted using a key, referred as table space key. This key is then encrypted using a master key and stored in the tablespace header 

Now when a request is made to access MySQL data, InnoDB use master key to decrypt tablespace key present tablespace header. After getting decrypted tablespace key, the tablespace is decrypted and make is available to perform read/write operations

Note: The decrypted version of a tablespace key never changes, but the master key can be rotated.

Data at rest encryption implemented using keyring file plugin to manage and encrypt the master key

After understanding the concept of encryption and decryption below are few Pros and Cons for using  DRE

Pros:

  • A strong Encryption of AES 256 is used to encrypt the InnoDB tables
  • It is transparent to all applications as we don’t need any application code, schema, or data type changes
  • Key management is not done by DBA.
  • Keys can be securely stored away from the data and key rotation is very simple.

Cons:

  • Encrypts only  InnoDB tables
  • Can’t encrypt  binary logs, redo logs, relay logs on unencrypted slaves, slow log, error log, general log, and audit log

Though we can’t encrypt binary logs, redo logs, relay logs on Mysql 5.7 but MariaDB has implemented this with a mechanism to encrypt undo/redo logs, binary logs/relay logs, etc.  by enabling few flags in MariaDB Config File

innodb_sys_tablespace_encrypt=ON
innodb_temp_tablespace_encrypt=ON
innodb_parallel_dblwr_encrypt=ON
innodb_encrypt_online_alter_logs=ON
innodb_encrypt_tables=FORCE
encrypt_binlog=ON
encrypt_tmp_files=ON

However, there are some limitations 

Let’s Discuss its problem/solutions and few solutions to them

  1. Running MySQL on a host will have access from root user and the MySQL user and both of them may access key file(keyring file) present on the same system. For this problem, we may have our keys on mount/unmount drive which can be unmounted after restarting MySQL.
  2. Data will not be in encrypted form when it will get loaded onto the RAM and can be dumped and read
  3. If MySQL is restarted with skip-grant-tables then again it’s havoc but this can be eliminated using an unmounted drive for keyring
  4.  As tablespace key remains the same so our security relies on Master key rotation which can be used  to save our master key 

NOTE: Do not to lose the master key file, as we cant decrypt data and will suffer data loss

Doing Is Learning, so let’s try 

As a prerequisite, we need a machine with MySQL server up and running Now for data at rest encryption to work we need to enable 

Enable file per table on with the help of the configuration file.  

 
[root@mysql ~]#  vim /etc/my.cnf
[mysqld]

innodb_file_per_table=ON

Along with the above parameter, enable keyring plugin and keyring path. This parameter should always be on the top in configuration so that it will get load initially when MySQL starts up. Keyring plugin is already installed in MySQL server we just need to enable it. 

[root@mysql ~]#  vim /etc/my.cnf
[mysqld]
early-plugin-load=keyring_file.so
keyring_file_data=/var/lib/mysql/keyring-data/keyring
innodb_file_per_table=ON

And save the file with a restart to MySQL

[root@mysql ~]#  systemctl restart mysql

We can check for the enabled plugin and verify our configuration.

mysql> SELECT plugin_name, plugin_status FROM INFORMATION_SCHEMA.PLUGINS WHERE plugin_name LIKE 'keyring%';
+--------------+---------------+
| plugin_name  | plugin_status |
+--------------+---------------+
| keyring_file | ACTIVE        |
+--------------+---------------+
1 rows in set (0.00 sec)


verify that we have a running keyring plugin and its location

mysql>  show global variables like '%keyring%';
+--------------------+-------------------------------------+
| Variable_name      | Value                 |
+--------------------+-------------------------------------+
| keyring_file_data  | /var/lib/mysql/keyring-data/keyring |
| keyring_operations | ON                                  |
+--------------------+-------------------------------------+
2 rows in set (0.00 sec)

Verify that we have enabled file per table 

MariaDB [(none)]> show global variables like 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON   |
+-----------------------+-------+
1 row in set (0.33 sec)

Now we will test our set up by creating a test DB with a table and insert some value to the table using below commands 

mysql> CREATE DATABASE test_db;
mysql> CREATE TABLE test_db.test_db_table (id int primary key auto_increment, payload varchar(256)) engine=innodb;
mysql> INSERT INTO test_db.test_db_table(payload) VALUES('Confidential Data');

After successful test data creation, run below command from the Linux shell to check whether you’re able to read InnoDB file for your created table i.e. Before encryption

Along with that, we see that our keyring file is also empty before encryption is enabled

[root@mysql ~]#  strings /var/lib/mysql/test_db/test_db_table.ibd
infimum
supremum
Confidential DATA

 

At this point of time if we try to check our keyring file we will not find anything

[root@mysql ~]#  cat /var/lib/mysql/keyring
[root@mysql ~]# 

Now let’s encrypt our table with below command and check our InnoDB file and keyring file content.

mysql> ALTER TABLE test_db.test_db_table encryption='Y';
[root@mysql ~] strings /var/lib/mysql/test_db/test_db_table.ibd
0094ca6d-7ba9-11e9-b0d0-0800275716d42QMw

The above content clear that file data is not readable and table space is encrypted. As previously oy keyring file data was absent/empty, so now it must be having some data.

 Note: Please look  master Key and time stamp(we will implement key rotation )

[root@mysql ~]  cat /var/lib/mysql/keyring-data/keyring
Keyring file version:1.0?0 INNODBKey-0094ca6d-7ba9-11e9-b0d0-0800275716d4-2AES???_gd?7m>0??nz??8M??7Yʹ:ll8@?0 INNODBKey-0094ca6d-7ba9-11e9-b0d0-0800275716d4-1AES}??x?$F?z??$???:??k?6y?YEOF
[root@mysql ~] ls -ltr /var/lib/mysql/keyring-data/keyring
-rw-r----- 1 mysql mysql 283 Sep 18 16:48 /var/lib/mysql/keyring-data/keyring

With known security concern for the compromised master key, we may use the master key rotation technique from time to time to save our key.

mysql> alter instance rotate innodb master key;
Query OK, 0 rows affected (0.00 sec)

After this command, we realise that our key timestamp is changed now and we have a new key. 

[root@mysql ~] ls -ltr /var/lib/mysql/keyring-data/keyring
-rw-r----- 1 mysql mysql 411 Sep 18 18:17 /var/lib/mysql/keyring-data/keyring

Some Useful Commands

Below are some helpful commands we may use in an encrypted system 

1. List All the tables with encryption enabled 

mysql> SELECT * FROM information_schema.tables WHERE create_options LIKE '%ENCRYPTION="Y"%' \G;
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: sample_db
TABLE_NAME: test_db_table
TABLE_TYPE: BASE TABLE
ENGINE: InnoDB
VERSION: 10
ROW_FORMAT: Dynamic
TABLE_ROWS: 0
AVG_ROW_LENGTH: 0
DATA_LENGTH: 16384
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 0
DATA_FREE: 0
AUTO_INCREMENT: 2
CREATE_TIME: 2019-09-18 16:46:34
UPDATE_TIME: 2019-09-18 16:46:34
CHECK_TIME: NULL
TABLE_COLLATION: latin1_swedish_ci
CHECKSUM: NULL
CREATE_OPTIONS: ENCRYPTION="Y"
TABLE_COMMENT: 
1 row in set (0.02 sec)

ERROR: 
No query specified

2. Encrypt Tables in a Database 

mysql> ALTER TABLE db.t1 ENCRYPTION='Y';

3. Disable encryption for an InnoDB table

mysql> ALTER TABLE t1 ENCRYPTION='N';

Conclusion : 

You can encrypt data at rest by using keyring plugin and we can control and manage it by master key rotation. Creating an encrypted Mysql data file setup is as simple as firing a few simple commands. Using an encrypted system is also transparent to services, applications, and users with minimal impact of system resources. Further with Encryption of data at rest, we may also implement encryption in transit. 

I hope you found this article informative and interesting. I’d really appreciate any and all feedback.

Achieve SSO in Privately Hosted Jenkins

Introduction

Providing OAuth 2.0 user authentication directly or using Google+ Sign-in reduces your CI overhead. It also provides a trusted and secure login system that’s familiar to users, consistent across devices, and removes the burden of users having to remember another username and password. One of the hurdles in implementing a Gmail authentication is that Google developer console and your  Jenkins server should be in the same network or in simple terms they can talk to each other.

Resources Used

  • Privately Hosted Jenkins
  • Google developer console
  • Ngrok
In this blog, I’m trying to explain how to integrate Gmail authentication feature in your privately hosted Jenkins server so that you get free of filling the form by the time of creating a new user.

Setup 1: Setup Ngrok

NGROK
 
Ngrok is multiplatform tunneling, reverse proxy software that establishes secure tunnels from a public endpoint such as the internet to a locally running network service while capturing all traffic for detailed inspection and replay.
We are using Ngrok to host our Jenkins service (running on port 8080) to public IP.

 
Go to google and search for Download Ngrok.
 
 
 
Either Login with google account or do Ngrok own signup.
 
 
After Logged in Ngrok Download it.
 
 
After Download Ngrok, Go to the console and unzip the downloaded zip file and then move it to /usr/local/bin.
Note: Moving part is optional, we do so for accessing ngrok from anywhere.
 
 
 
Go to ngrok UI page , copy the authentication key and paste it.
Note: Remove ” . / ” sign because we moved ngrok file to /usr/local/bin
 
 
 Major configuration for Ngrok is done. Now type the command:
ngrok http 8080
 Assuming that Jenkins is running on port 8080.
 
 
Now Ngrok Host our Jenkins Service to public IP.
 
Copy this IP, we will use it in the google developer console.
 
Note: Make this terminal up and running.(don’t do ctrl+c)

Step 2: Setup Google Developer Console

Go to google and search for google developer console.
 
 
After sign in into google developer console, we will redirect to Google developer console UI screen.
Go to Select a project  → New Project
 
 
Give Project Name, here I will use “JenkinsGmailAuthentication” and create a project. Creating a project takes 1 or 2 minutes.
 
 
After Project created, we will be redirected to the UI page as shown below. Now click on on the “Credentials” Tab on the left slide bar.
 
 
 
After Go to the OAuth consent screen tab and give the below entries. Here I will give Application name to “JenkinsGmailAuthentication”.
 
 
The important part of the Google developer console is Public IP we created using Ngrok. Copy Public IP in Authorized domains and note to remove ” http:// ” in Authorized domains.
 
 
After Setting OAuth consent screen, Go to   “Credentials Tab”→ Create Credentials→OAuthClientID
 
 
Select Application type as Web Application, give the name “JenkinsGmailAuthentication”.
Major Part of Create Credential has Authorized JavaScript origins and Authorized redirect URIs.
 
 
Copy Client ID and Client Secret because we are going to use these in Jenkins.
 

Step 3: Setup Jenkins

I am assuming that Jenkins is already installed in your system.
Go to Manage Jenkins → Manage Plugins→ Available
 
 
Search for “Google Login Plugin” and add it.
 
 
Go to Manage Jenkins → Configure Global Security
 
 
The major part of Jenkins Setup is to Configure Global Security.
Check the Enable security → Login with Google and Paste the Client ID and Client secret generated in Create Credential Step and Save.
 
 
Up to here, we are done with the Setup part.
Now Click on login button on Jenkins UI, you will redirect to Gmail for login.
 
 
Select the account from which you want to log in.
 
 
After selecting Account you will redirect to Jenkins and you are logged in as selected user.
 
 
You may be facing a problem when you log in again.
Logout from the current user and login again.
 
 
After redirected to Gmail select another user.
 
 
After selecting user you will be redirected to Error Page showing: HTTP ERROR 404.
 
 
Don’t worry, you have to just remove “securityRealm/” or enter again “localhost:8080”.
 
 
You are logged in with the selected user.
 
 
So now you know how to do Gmail Authentication between Google developer console and Jenkins when they are not directly reachable to each other.
Here the main bridge between both is Ngrok which host our Privately hosted Jenkins to outer internet.
 
 
 

Jenkins Pipeline Global Shared Libraries

Although, the coding language used here is groovy but Jenkins does not allow us to use Groovy to its fullest,  so we can say that Jenkins Pipelines are not exactly groovy. Classes that you may write in src, they are processed in a “special Jenkins way” and you have no control over this. Depending on the various scenarios objects in Groovy don’t behave as you would expect objects to work.

Our thought is putting all pipeline functions in vars is much more practical approach, while there is no other good way to do inheritance, we wanted to use Jenkins Pipelines the right way but it has turned out to be far more practical to use vars for global functions.

Practical Strategy
As we know Jenkins Pipeline’s shared library support allows us to define and develop a set of shared pipeline helpers in this repository and provides a straightforward way of using those functions in a Jenkinsfile.This simple example will just illustrate how you can provide input to a pipeline with a simple YAML file so you can centralize all of your pipelines into one library. The Jenkins shared library example:And the example app that uses it:

Directory Structure

You would have the following folder structure in a git repo:

└── vars
    ├── opstreePipeline.groovy
    ├── opstreeStatefulPipeline.groovy
    ├── opstreeStubsPipeline.groovy
    └── pipelineConfig.groovy

Setting up Library in Jenkins Console.

This repo would be configured in under Manage Jenkins > Configure System in the Global Pipeline Libraries section. In that section Jenkins requires you give this library a Name. Example opstree-library

Pipeline.yaml

Let’s assume that project repository would have a pipeline.yaml file in the project root that would provide input to the pipeline:Pipeline.yaml

ENVIRONMENT_NAME: test
SERVICE_NAME: opstree-service
DB_PORT: 3079
REDIS_PORT: 6079

Jenkinsfile

Then, to utilize the shared pipeline library, the Jenkinsfile in the root of the project repo would look like:

@Library ('opstree-library@master') _
opstreePipeline()

PipelineConfig.groovy

So how does it all work? First, the following function is called to get all of the configuration data from the pipeline.yaml file:

def call() {
  Map pipelineConfig = readYaml(file: "${WORKSPACE}/pipeline.yaml")
  return pipelineConfig
}

opstreePipeline.groovy

You can see the call to this function in opstreePipeline(), which is called by the Jenkinsfile.

def call() {
    node('Slave1') {

        stage('Checkout') {
            checkout scm
        }

         def p = pipelineConfig()

        stage('Prerequistes'){
            serviceName = sh (
                    script: "echo ${p.SERVICE_NAME}|cut -d '-' -f 1",
                    returnStdout: true
                ).trim()
        }

        stage('Build & Test') {
                sh "mvn --version"
                sh "mvn -Ddb_port=${p.DB_PORT} -Dredis_port=${p.REDIS_PORT} clean install"
        }

        stage ('Push Docker Image') {
            docker.withRegistry('https://registry-opstree.com', 'dockerhub') {
                sh "docker build -t opstree/${p.SERVICE_NAME}:${BUILD_NUMBER} ."
                sh "docker push opstree/${p.SERVICE_NAME}:${BUILD_NUMBER}"
            }
        }

        stage ('Deploy') {
            echo "We are going to deploy ${p.SERVICE_NAME}"
            sh "kubectl set image deployment/${p.SERVICE_NAME} ${p.SERVICE_NAME}=opstree/${p.SERVICE_NAME}:${BUILD_NUMBER} "
            sh "kubectl rollout status deployment/${p.SERVICE_NAME} -n ${p.ENVIRONMENT_NAME} "

    }
}

You can see the logic easily here. The pipeline is checking if the developer wants to deploy on which environment what db_port needs to be there.

Benefits

The benefits of this approach are many, some of them are as mentioned below:

  • How to write groovy code is now none of the developer’s perspective.
  • Structure of the Pipeline.yaml is really flexible, where entire data structures can be passed as input to the pipeline.
  • Code redundancy saved to a large extent.

 Jenkinsfiles could actually just look more commonly, like this:

@Library ('opstree-library@master') _
opstreePipeline()

and opstreePipeline() would just read the the project type from pipeline.yaml and dynamically run the exact function, like opstreeStatefulPipeline(), opstreeStubsPipeline.groovy() . since pipeline are not exactly groovy, this isn’t possible. So one of the drawback is that each project would have to have a different-looking Jenkinsfile. The solution is in progress!So, what do you think?

Reference links: 
Image: Google image search (jenkins.io)

AWS RDS cross account snapshot restoration

Many a times you may have faced problem where your production infra is on different AWS account and non prod on different account and you are required to restore the RDS snapshot to non prod account for testing.

Recently I got a task to restore my prod account RDS snapshot to a different account for testing purpose. It was a very interesting and new task for me. and I was in an awe, how AWS thinks about what all challenges we may face in real life and provides a solution to it.

For those who are not aware about RDS, I can brief RDS as a relational database service by Amazon Web Services (AWS), it is a managed service so we don’t have to worry about the underlying Operating System and Database software installation, we just have to use it.

Amazon RDS creates a storage volume snapshot of your DB instance backing up the entire DB instance and not just individual database. As I told you, we have to copy and restore an RDS snapshot to a different aws account. There is a catch!, you can directly copy an aws snapshot to a different region in same aws account, but to copy to a different aws account you need to share the snapshot to aws account and then restore from there, so lets begin.

To share an automated DB snapshot, create a manual DB snapshot by copying the automated snapshot, and then share that copy.

Step 1: Find the snapshot that you want to copy, and select it by clicking the checkbox next to it’s name. You can select a “Manual” snapshot, or one of the “Automatic” snapshots that are prefixed by “rds:”.

Step 2: From the “Snapshot Actions” menu, select “Copy Snapshot”.

Step 3: On the page that appears: Select the target region. In this case, since we have to share this snapshot with another aws account we can select existing region.

  • Specify your new snapshot name in the “New DB Snapshot Identifier” field. This identifier must not already be used by a snapshot in the target region.
  • Check the “Copy Tags” checkbox if you want the tags on the source snapshot to be copied to the new snapshot.
  • Under “Encryption”, leave “Disable Encryption” selected.
  • Click the “Copy Snapshot” button.

Step 4: Once you click on “Copy Snapshot”, you can see the snapshot being created.

Step 5: Once the manual snapshot is created, select the created snapshot, and from the “Snapshot Actions” menu, select “Share Snapshot”.

Step 6: Define the “DB snapshot visibility” as private and add the “AWS account ID” to which we want to share the snapshot and click on save.

Till this point we have shared our db snapshot to the aws account where we need to restore the db.
Now login to the other aws account and go to RDS console and check for snapshot that was shared just recently.

Step 7: Select the snapshot and from the “Snapshot Actions” menu select “Restore Snapshot”.

Step 8: From here we just need to restore the db as we do normally. Fill out the required details like “DB Instance class”, “Multi-AZ-Deployment”, “Storage Type”, “VPC ID”, “Subnet group”, “Availability Zone”, “Database Port”, “DB parameter group”, as per the need and requirement.

Step 9: Finally click on “Restore DB instance” and voila !!, you are done.

Step 10: You can see the db creation in process. Finally, you have restored the DB to a different AWS account !!

Conclusion:

So there you go. Everything you need to know to restore a production AWS RDS into a different AWS account. That’s cool !! Isn’t it ?, but I haven’t covered everything. There is a lot more to explore. We will walk through RDS best practices in our next blog, till then keep exploring our other tech blogs !!.

Image source: https://unsplash.com/photos/lRoX0shwjUQ


Why I love pods in Kubernetes? Part – 1

When I began my journey of learning Kubernetes, I always thought why Kubernetes has made the pod its smallest entity, why not the container. But when I started diving deep in it I realized, there is a big rationale behind it and now I thank Kubernetes for making the Pod as an only object, not containers.

After being inspired by the working of a Pod, I would like to share my experience and knowledge with you guys.

Image result for kubernetes pod memes

What exactly Pod means?

The literal meaning of pod means the peel of pea which holds the beans and following the same analogy in Kubernetes pod means a logical object which holds a container or more than one container.
The bookish definition could be – a pod represents a request to execute one or more containers on the same node.

Why Pod?

The question that needs to be raised why pod?So let me clear this, pods are considered the fundamental building blocks of Kubernetes, because all the Kubernetes workloads, like DeploymentsReplicaSets or Jobs are eventually expressed in terms of pods.

Pods are the one and only objects in Kubernetes that results in the execution of containers which means No Pod No Containers !!!

Now after the context setting over pod I would like to answer my beloved question:- Why Pod over container??

My answer is why not 🙂 

Let’s take an example, suppose you have an application which generates two types of logs one is access log and other logs are error log. Now you have to add log shipper agent, In case of the container, you will install the log shipper in the container image. Now you got another request to add application monitoring in the application. So again you have to recreate the container image with APM agent in it.
Don’t you think this is quite an untidy way to do it? Of course, it is, why I have to add these things in my application image, it makes my image quite bulky and difficult to manage.

What if I tell you that Kubernetes has its own way of dealing situations like this. 

Yup the solution is a sidecar. Just like in real life if I have a two sitter bike and I want to take 3 persons on a ride, So I will add a sidecar in my bike to take 2 persons together on the ride.
In a similar fashion, I can do the same thing with Kubernetes as well. To solve the above problem I will just create 3 containers (application, log-shipper and APM agent) in the same pod. Now the question is how they will access the data between them and how the networking magic will happen.
The answer is quite simple containers withing the pod can share Pod IP address and can listen on localhost. For volume, we can share volumes also across the containers in a pod.

The architecture would be something like this:-

Related image

Now another interesting query arises that when to use sidecar and when not.

Just as shown in the above image we should not keep application and database as a sidecar in the same pod. The reason behind it is Kubernetes does not scale a container it scales a pod. So when autoscaling will happen it scales the application as well as database which could not be required.

Instead of that, we should keep log-shippers, health-check containers and monitoring agent as a sidecar because anyhow application will scale these agents also needs to be scaled with the application.

Now I am assuming you are also madly in love with the pods.

For diving deep in the pod stay tuned for the next part of this blog Why I love pods in Kubernetes? Part – 2. In my next part, I will discuss the different phases and lifecycle of the pod and how pod makes our life really smooth.
Thanks for reading, I’d really appreciate any and all feedback, please leave your comment below if you guys have any feedback.

Cheers till the next time.

Docker-Compose As A Bundled Application

When docker was released as a new containerization tool, it took the market by a storm. With its lightweight images, multi-os support, and ability to ship containers, it’s popularity only roared. I have been using it for more than six months now, I can see why it is so. Hypervisors, another type of virtualizing tools,  have been hard on hardware. Which means they require a lot of resources to run. This increases the cost of running applications way more than those running on containers. This is the problem docker solved and hence, it’s popularity. Docker engine just sits on host OS and translates the instructions from an application to the underlying OS. It does not need one extra layer of virtual OS, just the binaries and libraries of application bundled in the image. Right? Now, hold on to that thought. We all have been working with docker and an extension with docker-compose. Why? Because it makes our job easy, We are spared from typing hundreds of ad-hoc commands in terminal to set up a slightly or very complicated application with certain dependencies. We can just describe it in a `docker-compose.yml` file and our job is done. However, the problem arises when we have to share that compose file:

  • Other users might need to use the file in a different environment, so they will need to edit all the values pertaining to their need, manually, and keep separate compose files for each environment.
  • Troubleshooting various configuration issues can be a tedious task since there is no single place where the configuration of the application can be stored. Changes will have to be made in the file.
  • This also makes communication between Dev and Ops team more tricky than it has to be resulting in communication gap and time wastage.

To have a more clear picture of the issue, we can have look at the below image:

We have compose file and configuration for separate environments, we make changes according to environment needs in different compose files, which could be a long manual task depending on the size of our project.


All of this points to the fact that there is no way to bundle the applications that use efficiently-bundled docker images. See the irony here? Well, there “was” no way, until there was. Enter ‘docker-app’. This, relatively, new tool is the answer to packaging docker-compose applications. I came across it when I was, myself, struggling to re-use a docker-compose application I had written in another environment. As soon as I read about it, I had to try it, which I did and loved. It made the task much easier as it provided a template of compose file and a key-value store for environment dependent parameters.


Now, we have an artefact with extention of ‘.dockerapp’. We can pass configuration values either through CLI or files or both and it will render compose file according to those values.

Let us now go through an example of how the docker app works. I am going to deploy a dummy application Spring3hibernate from Opstree Github repository in QA env and later in PROD by making simple configuration changes.
Installing docker-app is easy, though, there is one thing one should keep in mind: it can be installed as a plugin in docker-CLI or as standalone CLI tool itself. I will be installing it as a standalone CLI tool on linux. If you wish to install it as a plugin to docker-CLI and/or on another OS, visit their Github page: https://github.com/docker/app (Also, please visit github page for basics)
Before continuing, please ensure you have docker-CLI and docker-compose installed.
Please follow below steps to install docker-app:

$ export OSTYPE="$(uname | tr A-Z a-z)"
$ curl -fsSL --output "/tmp/docker-app-${OSTYPE}.tar.gz" \
"https://github.com/docker/app/releases/download/v0.8.0/docker-app-${OSTYPE}.tar.gz"
$ tar xf "/tmp/docker-app-${OSTYPE}.tar.gz" -C /tmp/
$ install -b "/tmp/docker-app-standalone-${OSTYPE}" /usr/local/bin/docker-app

Create a new directory in your home, we’ll call it app home:

$ cd ~
$ mkdir spring3hibernate-app
$ cd spring3hibernate-app/

Now, clone the app from Opstree Github repository. This app needs only mysql as a dependency.

$ git clone https://github.com/opstree/spring3hibernate.git

We need to update database properties file and nginx config file with below contents respectively:

$ vim ~/spring3hibernate-app/spring3hibernate/src/main/resources/database.properties

Replace below content over there:

database.driver=com.mysql.jdbc.Driver
database.url=jdbc:mysql://mysql:3306/employeedb
database.user=admin
database.password=password
hibernate.dialect=org.hibernate.dialect.MySQLDialect
hibernate.show_sql=true
hibernate.hbm2ddl.auto=update
upload.dir=c:/uploads

For nginx conf file:

$ vim ~/spring3hibernate-app/spring3hibernate/nginx/default.conf
server {
    listen       80;
    server_name  localhost;

    location / {
        stub_status on;
        proxy_pass http://springapp1:8080/;

    }
# redirect server error pages to the static page /50x.html
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }

}

Move ‘default.conf’ to ~/spring3hibernate-app/spring3hibernate/nginx/conf/qa/ as we have different conf file for PROD which goes to ~/spring3hibernate-app/spring3hibernate/nginx/conf/prod/

upstream s3hbackend {
    server springapp1:8080;
    server springapp2:8080;
}
server {
       listen 80;
       location / {
           stub_status on;
           proxy_pass http://s3hbackend;
       }
  
       # redirect server error pages to the static page /50x.html
       error_page   500 502 503 504  /50x.html;
       location = /50x.html {
           root   /usr/share/nginx/html;
       }

}

This is the configuration for the nginx load balancer. Remember this, we’ll use it later. Let’s create our docker-app now, make sure you are in the app home directory
when executing this command:

$ docker-app init --single-file s3h

This will create a single file named s3h.dockerapp which will look like this: 

# This section contains your application metadata.
# Version of the application
version: 0.1.0
# Name of the application
name: s3h
# A short description of the application
description:
# List of application maintainers with name and email for each
maintainers:
  - name: ubuntu
    email:


---
# This section contains the Compose file that describes your application services.
version: "3.6"
services: {}


---
# This section contains the default values for your application parameters.

{}

As you can see this file is divided into three parts, metadata, compose, and parameters. They are all in one file because we used –single-file switch. We can divide them up in multiple files by using docker-app split command in app home directory, docker-app merge will put them back in one file. Now, for QA, we have the following configuration for s3h.dockerapp file:

version: 0.1.0
name: s3h
description:
maintainers:
  - name: atbk5
    email: adeel.ahmad@opstree.com


---
version: "3.7"
services:
  mysql:
    image: mysql:5.7
    container_name: mysql
    environment:
      MYSQL_ROOT_PASSWORD: ${mysql.env.rootpass}
      MYSQL_DATABASE: ${mysql.env.database}
      MYSQL_USER: ${mysql.env.user}
      MYSQL_PASSWORD: ${mysql.env.userpass}
    restart: always
    networks:
      - backend
    volumes:
      - db_data:/var/lib/mysql


  spring1:
    depends_on:
      - mysql
    build:
      context: ./spring3hibernate/
      dockerfile: Dockerfile
    container_name: springapp1
    restart: always
    networks:
      - backend
      - frontend


  spring2:
    depends_on:
      - mysql
    build:
      context: ./spring3hibernate/
      dockerfile: Dockerfile
    container_name: springapp2
    restart: always
    networks:
      - backend
      - frontend
    x-enabled: ${spring.app2}


  nginx:
    depends_on:
      - spring1
    image: nginx:alpine
    container_name: proxy
    restart: always
    networks:
      - frontend
    volumes:
      - ${nginx.conf}:/etc/nginx/conf.d
    ports:
      - ${nginx.port}:80
    x-enabled: ${nginx.status}


networks:
  frontend:
  backend:


volumes:
  db_data:


---
mysql:
  env:
    rootpass: password
    database: employeedb
    user: admin
    userpass: password
nginx:
  conf: /home/ubuntu/dockerApp/spring3hibernate/nginx/conf/qa
  port: 81
  status: true
spring:
  app2: false

As mentioned before, first part contains app metadata, second part contains actual compose file with lots of variables, and last part contains values of those variables. Special mention here is x-enabled variable, docker-app provides functionality to temporarily disable a service using this variable. Now, try a few commands:

$ docker-app inspect

It will produce summary of whole app.

$ docker-app render

It will replace all variables with their values and will produce a compose file

$ docker-app render --set nginx.status=”false”

It will remove nginx from docker-app compose as well as deploy

$ docker-app render | docker-compose -f - up

It will spin up all the containers according to rendered compose file. We can see the application running on port 81 of our machine.

$ docker-app --help

To check out more commands and play around a bit.
At this point, it will be better to create two directories in app home: qa and prod. Create a file in qa: qa-params.yml. Another file in prod: prod-params.yml. Copy all parameters from above s3h.dockerapp file to qa-params.yaml (or not). More importantly, copy below changes in parameters to prod-params.yml

mysql:
  env:
    rootpass: password
    database: employeedb
    user: admin
    userpass: password
nginx:
  conf: /home/ubuntu/dockerApp/spring3hibernate/nginx/conf/prod
  port: 80
  status: true
spring:
  app2: true

We are going to loadbalance springapp1 and springapp2 in PROD environment, since we have enabled springapp2 using x-enabled parameter. We have also changed nginx conf bind path to the new conf file and host port for nginx to 80 (for Production). All so easily. Run command:

$ docker-app render --parameters-file ./prod/prod-params.yaml

This command will produce a compose file ready for production deployment. Now run:

$ docker-app render --parameters-file ./prod/prod-params.yml | docker-compose -f - up

And production is deployed … Visit port 80 of your localhost to verify. What’s more exciting is that we can also share our docker-apps through docker hub, we can tag the app and push it to our remote registry as images after logging in:

$ docker login

Provide your username and password for docker hub, create an account if not yet created.

$ docker-app push --tag atbk5/s3h.dockerapp:latest

If we wish to upload additional files as well, we will have to split our project using docker-app split and put additional files in the directory before pushing. The additional files will go as attachments which can be accessed later.

Conclusion

With the arrival of docker app, our large, composite, and containerized applications can also be shipped and re-used as images. That is cool. But there’s something cooler which we haven’t explored yet. Deploying our docker-apps on kubernetes with the goal of exploring how far in management, and how optimal in delivery, we can go with our applications. Let’s keep this as a topic for the next blog. Until then, have a nice one. 🙂

Image Source: https://reflectoring.io/externalize-configuration/