Why GitOps is so exciting?

Initially, we had the DevOps framework in which Development and Operation team collaborated to create an agile development ecosystem. Then a new wave came with the name of “DevSecOps” in which we integrated the security into the existing DevOps process. But nowadays a new terminology “GitOps” is getting famous because of its “Single Source of Truth” nature. Its fame has reached to this level that it was a trending topic at KubeCon.

So just like everyone I also got curious about it and started reading about it and then thanks to my company I also got the chance to implement it for one of our projects.

Also, a lot of people think that GitOps is only for containers and its orchestration tools. This isn’t true because GitOps is a philosophical approach, yes I agree GitOps are primarily used in the containerized workflow because of its high level of agility but we can use GitOps in the non-containerized environment as well using plane CI-CD tools like Jenkins.

What is GitOps?

In a nutshell, GitOps entails using Git and Pull/Merge Request to manage all stages of software development. Whereas the developer has used Git to manage application source code, GitOps uses Git to share information and coordinate activities like infra provisioning, application deployment.

Also, we can say that GitOps is a methodology that uses Git as a single source of truth for declarative infrastructure and applications.

GitOps Feature set

GitOps provides a lot of features we cannot list down all features but yes we can highlight some important features of it.

  • Single Source of Truth:- The first and most important principle of GitOps. It states Git is always right. You can understand the whole system just by looking at Git because it has all the ingredients right there.
  • Everything as a Code:- It states everything should be kept in the form of code be it the application or any other component which application needed. In most cases, infrastructure is also defined in the form of code. For example:- Cloud VMs, Docker containers, Kubernetes Deployments.
  • CI/CD Automation:- This is the feature where the magic happens. Build, Test and Deploy will happen automatically relying on the truth in the repository. Infrastructure creation and application deployment will be part of this according to the infrastructure and configuration as a code.

Why GitOps?

One of the major benefits of using GitOps is its self-healing nature. Since git is the source of truth for the whole system, at any time if someone unintentionally modified the system it will be reverted back to keep sync with git.

Also, rollback is also quite easy in this scenario. In most cases, the master branch reflects the state of the system. If there is any kind of failure in the application and it needs to be rollbacked, it can be simply done by the reverting to the previous git state.

It also solves the issue of audit trail and transparency because everything is done via Pull Request. If something doesn’t work you have a single place to look for i.e. commit history.

Source ImageFlip

Conclusion

Since GitOps has become so much popular, the opensource communities are developing the tools for the same. As these tools get matured, GitOps implementation would be much easier.

Now I am assuming that you are also fascinated by GitOps.

Soon I will be writing a blog on GitOps implementation on Kubernetes using Jenkins.
Thanks for reading, I’d really appreciate any and all feedback, please leave your comment below if you guys have any feedback.

Cheers till next time!!

Recap Amrita InCTF 2019 | Part 2

+

Amrita InCTF 10th Edition is an offline CTF(Capture the Flag) event hosted by Amrita University. In our previous blog, we discussed about talks from the first day. In this we’ll share some lights on the talks from second day.

Talk 1: Exploring attack surfaces in embedded devices by Vivek 

The IoT’s has become popular in everyday household item like a fridge, washing machine, camera, and television. You can access them remotely and some devices can communicate with each other. These connections become entry points to the attacker. 

Now a days commodity devices are getting intelligent and SOC’s are pretty much cheaper($5 raspberry pi Zero),as such they are all over us from watch to contact glasses; these are all now getting connected to a massive IoT network all connected and potentially vulnerable.

Below are some insights on key topics discussed in the talk around IOT security.

BLE Security Testing:- Bluetooth Low Energy Low cost and ease of implementation lead BLE to be widely used among IoT devices and applications like Wearable sensors, light-bulbs and medical devices.

BLE has three main vulnerabilities.

  • Eavesdropping:- Eavesdropping refers to a third-party device listening in on the data that’s being exchanged between two paired devices.
  • Man in the Middle Attacks (MITM): Man in the middle attacks involve a third party device impersonating a legitimate device, tricking two legitimate devices into believing that they’re connected to each other, when in reality, the legitimate devices are connected to the impersonator.
  • Denial of Service & Fuzzing Attack: DoS attacks expose a system to the possibility of frequent crashes leading to a complete exhaustion of its battery.Fuzzing attacks too lead to systems crashing as an attacker may send malformed or non-standard data to a device’s.

ZigBee Security Testing- : Zigbee is a wireless communication Protocol. It’s used to connect the sensor, door locks, electric meters and traffic management systems. This protocol is open at a network level. So when the devices start connecting they send out beacon requests.

 NFC RFID cloning :- 

There is no security, they’ve been hacked, there’s no protection of data, no privacy, everything is in the clear and it’s not resistant to sniffing or common attacks. 

To access sensitive information, you have to provide that sector of memory with the right key—otherwise, it will show up blank. Even though these cards are a lot more secure, once you know the encryption algorithm you can decrypt them and access the sensitive information. With that, people can also clone these cards relatively easily.

Hello barbie hardware hacking:-  The doll uses Wi-Fi transmit audio from children talking like siri or google assistant. The toy uses a digital ID that attackers can abuse and potentially let them spy on the chatter between a doll and a server. phones with the app will automatically connect to any Wi-Fi network that includes “Barbie” in its name.

Talk 2: APT attack by Shaunak :

Shaunak is the CEO of Zacco cyber security company. He talks about his experience in APT attacks. He give a brief intro about what Advanced persistent threat attacks. He also shared his experience of finding out an APT attack within an organization which had no clue about it.

What is APT Attack:- 

APT attack are perform in large scale . APT attacks are a cyber crime directed at business and political targets. Organized crime groups may sponsor advanced persistent threats to gain information they can use to carry out criminal acts for financial gain.

How an APT attack works:-

  • Gain access: APT groups Target by targeting system. Like social engineering techniques and application vulnerability. 
  • Establish a foothold: After gaining access target, APT group do future reconnaissance, create networks of backdoors and tunnels that they can use to move around unnoticed. APTs may use advanced malware techniques such as code rewriting to cover their tracks.
  • Gain even greater access: Once inside a network APT actors may use such methods as password cracking to gain administrative rights. So they can get high level access.
  • Move laterally: After getting admin access, they can then move around the enterprise network.  They can attempt to attack other servers. 
  • Stage the attack: At this point, the hackers centralize, encrypt and compress the data so they can exfiltrate it.
  • Take the data: The attacker transfers data in his own system.
  • Remain until they’re detected: The APT group can repeat this process for a long time until they detected. 

 Talk 3: Intel L1 Terminal Fault Vulnerability
by Reno Robert

Reno Robert talks about Inlet L1 Terminal Fault most Intel processors are affected with this vulnerability. It can allow attackers to access sensitive information stored in the Level 1 CPU cache. 

This may include data from the operating system, kernel, hypervisor or the neighboring virtual machine.

It may allow a malicious code execution on one thread to access data from the L1 cache of another thread within the same core.

  • L1TF system information:- An attacker can use this vulnerability to read any physical memory location that is cached in the L1 data cache of the processor.
  • Page-table entries:- The memory addresses used by both user space and the kernel do not point directly into physical memory. Instead, the hierarchical page-table structure is used to translate between virtual and physical addresses. 
  • Flush L1 data cache on security domain transition:- The L1D is shared by all LPs on the same physical core. This means disclosure can be prevented by flushing the L1 data cache when transitioning between security domains. Intel has provided new capabilities through a microcode update that supports an architectural interface for flushing the L1 data cache.

This was all from day 2 talk, Come back on next Tuesday for talks from Day 3. And as the final segment of this series we’ll be updating about attack/defense and jeopardy CTF experience.

We’ll be more than happy to hear from you in comments section regarding any feedback or criticism.

Stay Tuned, Happy Blogging!

Reference: https://blog.attify.com/the-practical-guide-to-hacking-bluetooth-low-energy/

https://msrc-blog.microsoft.com/2018/08/14/analysis-and-mitigation-of-l1-terminal-fault-l1tf/

https://research.kudelskisecurity.com/2017/11/21/zigbee-security-basics-part-3/

Recap Amrita InCTF 2019 | Part 1

Amrita InCTF 10th Edition, is an offline CTF(Capture the Flag) event hosted by Amrita university at their Amritapuri campus 10 KM away from Kayamkulam in Kerala, India. In this year’s edition two people from Opstree got invited to the final round after roughly two months of solving challenges online. The dates for the final rounds were 28th,29th and 30th December 2019. The first two days comprised of talks by various people from the industry and the third day was kept for the final competition. In the upcoming three blog series starting now, we’d like to share all the knowledge, experiences and learning from this three day event.

Talk from Cisco

The hall was full of a little more than 300 people, among which a lot were college students all the way ranging from sophomore year up till final as well as pre-final year. Also, to our surprise there were roughly 50+  school students sitting ready to compete for the final event as well. The initial talk by CISCO was refreshing and very insightful for everyone present in the room. The talk majorly focused on how technology is changing lives all around the world be it with machine learning to help doctors treat faster or be it use drones to put off fire or IoT enabled system to provide efficient irrigation at remote areas. The speakers also made a point on how learning in a broader segment of technologies and tools serves longer than in depth knowledge of limited technology.
One thing that really stuck with me was that never learn a technology just for the sake of it or for the hype around it. But learn with a thought on how it can solve a problem around us.

Talk Title: Cyberoam startup and experiences -Hemanth Patel

Hemal Patel talked about his couple of startups and how he has always learned through failures. The talk was full of experiences and it is always serene to listen to someone telling about how they failed over and over again which eventually led them to succeed at whatever they are doing today. He talked about CyberRoam which is a Sophos Company, secures organizations with its wide range of product offerings at the network gateway. The talk went on to give us an overview of how business is done along different governments all around the world and how Entrepreneurship is so much more than just tackling a problem at a business level. And the how Cyberroam ended up making the product that they have today. 

Talk on Security by Cisco – Radhika Singh and Prapanch Ramamoorthy 

This was a wide range talk about a lot of things affecting us. We’ll try to list down most of it here. 

The talk started out with exploring Free/Open WiFi. Though it has a huge benefit of wifi being free it comes with a lot of risks as well. To name a few : 

→ Sniffing

→ Snooping 

→ Spoofing

These just to mention a few ways you can be compromised over a free WiFi. 

You can read up more on it here :

The talk also presented us with facts over data, how only 1% of the total data is generated via laptops and computers, Rest all are generated by smart phones, smart TVs as other IoT devices. Hence comes a very important point of securing IoT devices. 

It was pointed out during the talk that majority of the companies worry about security over the end of the entire IoT chain i.e. over the cloud etc. But not many people are caring about the edge devices and how lack of security measures here can compromise them. 

There was this really interesting case study about of IoT devices brought down the internet for the entire US east coast and how this attack was just meant to get some more time to submit an assignment at it’s initial days. Read more on this story from 2016 here

Hackers prefer to exploit IOT devices over cloud infrastructure.

Memes apart, The talk also focused on privacy vs security and how  Google’s dns resolution encryption helps in securing DNS based internet traffic on the world wide web. 

National Critical Information Infrastructure Protection Centre(NCIIPC)

National Critical Information Infrastructure Protection Centre (NCIIPC) is an organisation of the Government of India created under Sec 70A of the Information Technology Act, 2000 (amended 2008), through a gazette notification on 16th Jan 2014, based in New Delhi, India. It is designated as the National Nodal Agency in respect of Critical Information Infrastructure Protection. 

Representatives from this organization was there to speak at the event and they talked in detail about defining what is a CII (Critical Information Infrastructure) is and how any company with such infrastructure needs to inform the government about it. 

A CII is basically any Information Infrastructure (by any financial/medical etc institute) which if compromised can affect the national security of the country. And attacking any such infrastructure is an act of terrorism as defined by the article 66F in the IT Act,2018. 

They talked about some of the threats they deal with at the national level. They particularly talked about how BGP routing protocol which works on trust was compromised lately to route all Indian traffic via Pakistan servers/routers.

Image result for darknet dark web internet

One more interesting talk was about the composition of Internet.

How we think that the internet we see would comprise of 90% of the total internet but in reality it’s just 4%, bummer right? .  Deep web is the one which comprises of 90% of the total internet and as a matter of fact that no one completely knows about the DarkNet and it’s volume. Hence even the numbers mentioned above are as good as a guess.

 This was a very insightful talk and put a lot of things in perspective.

Digital Forensics – Beyond the Good Ol’ Data Recovery by Ajith Ravindran 

Image result for data forensics

This talk by Ajith Ravindran mainly focused on Computer forensics, which is the application of investigation and analysis techniques to gather and preserve evidence from a particular computing device in a way that is suitable for presentation in a court of law.

The majority of tips and tricks shared were about getting data from Windows based machines even after it is deleted from the system and how such data can be retrieved in order to show as proof for crimes.

Some of the tricks talked about are mentioned below : 

The prefetch files in Windows  gives us the list of files and executables last accessed and the number of times executed. 

Userassist allows investigators to see what programs were recently executed on a system.

Shellbags list down files that are accessed via a user at least once.

Master file table enables us to get a list of all the files in the system, or even entered the system via network of USB drives.

$usrnjrnl gives us information regarding all user activities in past 1-2 days.

Hiberfil.sys is a file the system creates when the computer goes into hibernation mode. Hibernate mode uses the Hiberfil.sys file to store the current state (memory) of the PC on the hard drive and the file is used when Windows is turned back on.

This was all from day 1 talk, Come back on next Tuesday for talks from Day 2. And as the final segment of this series we’ll be updating about attack/defense and jeopardy CTF experience.

Stay Tuned, Happy Blogging!

Log Everything as JSON

Logging and monitoring are like Tony Stark and his Iron Man suit, the two will go together. Similarly, logging and monitoring work best together because they complement each other well.

For many years, logs have been an essential part of troubleshooting application and infrastructure performance. But over the period of time we have realized that logs are not only meant for troubleshooting purposes, they can also be used for business dashboards visualization and performance analysis.

So logging application data in a file is great, but we need more.

Why JSON logging is best framework?

For understanding the greatness of the JSON logging framework, let’s understand this conversation between Anuj(A system Engineer) and Kartik(A Business Analyst).


A few days later Kartik complains that Web Interface is broken. Anuj scratches his head and takes a look at the logs and realizes that Developer has added an extra field to the log lines broked his custom parser.

I am sure anyone can face a similar kind of situation.

In this case, if Developer has designed the application to write logs as JSON, it would be a piece of cake for Anuj to create a parser for that because then he has to search fields on the basis of the JSON key and it doesn’t matter how many new fields are getting added in the logline.

The biggest benefit of logging in JSON is that it has a structured format. This makes possible to analyze application logs just like Big Data. It’s not just readable, but a database that can be queried for each and every field. Also, every programming language can parse it.

Magic with JSON logging

Recently, we have created a sample Golang application to get Code Build, Code Test and Deployment phase experience with Golang Applications. So while writing this application we have incorporated the functionality to write logs in JSON.
The sample logs are something like this:-

And while integrating ELK for logs analysis, the only parsing line we have to add in logstash is:-

 
filter {
    json {
        source => "message"
    }
}

After this, we don’t require any further parsing and we can add as many fields in the log file.

As you can see I have all fields available in Kibana like:- employee name, employee city and for this, we do not have to add some complex parsing in logstash or in any other tool. Also, I can create a beautiful Business Dashboard with this data.

Application Repository Link:-
https://github.com/opstree/Opstree-Go-WebApp

Conclusion

It will not take too long to migrate from text logging to JSON logging as there are multiple programming language log drivers are available. I am sure JSON logging will provide more flexibility to your current logging system.
If your organization is using any Log Management platform like Splunk, ELK, etc. I think JSON logging could be a companion of it.

Some of the popular logging drivers which support JSON output are:-

Golang:- https://github.com/sirupsen/logrus
Python:- https://github.com/thangbn/json-logging-python
Java:- https://howtodoinjava.com/log4j2/log4j-2-json-configuration-example/
PHP:- https://github.com/nekonomokochan/php-json-logger

I hope now we have a good understanding of JSON logging. So now it’s time to choose your logging wisely.


That’s all I have, thanks for reading, I’d really appreciate any and all feedback, please leave your comment below if you guys have any feedback or any queries.

Cheers till next time!!

Image Reference:- https://www.google.com/url?sa=i&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwi0jJzD8t_mAhVOzDgGHc6ODNQQjB16BAgBEAM&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DbPTZ43nM688&psig=AOvVaw0N3k7-bSq4OD4oMxPxYPrN&ust=1577881894317756

Redis Cluster: Setup, Sharding and Failover Testing

Watching cluster sharding and failover management is as gripping as visualizing a robotic machinery work.

My last blog on Redis Cluster was primarily focussed on its related concepts and requirements. I would highly recommend to go through the concepts first to have better understanding.

Here, I will straight forward move to its setup along with the behaviour of cluster when I intentionally turned down one Redis service on one of the node.
Let’s start from the scratch.

Redis Setup

Here, I will follow the approach of a 3-node Redis Cluster with Redis v5.0 on all the three CentOS 7.x nodes.

Setup Epel Repo

wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm

Setup Remi Repo

yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum --enablerepo=remi install redis

redis-server --version
Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=619d60bfb0a92c36

3-Node Cluster Prerequisites

While setting up Redis cluster on 3 nodes, I will be following the strategy of having 3 master nodes and 3 slave nodes with one master and one slave running on each node serving redis at different ports. As shown in the diagram Redis service is running on Port 7000 and Port 7001

  • 7000 port will serve Redis Master
  • 7001 port will serve Redis Slave

Directory Structure

We need to design the directory structure to server both redis configurations.

tree /etc/redis
/etc/redis
`-- cluster
    |-- 7000
    |   `-- redis_7000.conf
    `-- 7001
        `-- redis_7001.conf

Redis Configuration

Configuration file for Redis service 1

cat /etc/redis/cluster/7000/redis_7000.conf
port 7000
dir /var/lib/redis/7000/
appendonly yes
protected-mode no
cluster-enabled yes
cluster-node-timeout 5000
cluster-config-file /etc/redis/cluster/7000/nodes_7000.conf
pidfile /var/run/redis_7000.pid

Configuration file for Redis service 2

cat /etc/redis/cluster/7001/redis_7001.conf
port 7001
dir /var/lib/redis/7001
appendonly yes
protected-mode no
cluster-enabled yes
cluster-node-timeout 5000
cluster-config-file /etc/redis/cluster/7001/nodes_7001.conf
pidfile /var/run/redis_7001.pid

Redis Service File

As we are managing multiple service on a single instance, we need to update service file for easier management of redis services.

Service management file for Redis service 1

cat /etc/systemd/system/redis_7000.service
[Unit]
Description=Redis persistent key-value database
After=network.target

[Service]
ExecStart=/usr/bin/redis-server /etc/redis/cluster/7000/redis_7000.conf --supervised systemd
ExecStop=/bin/redis-cli -h 127.0.0.1 -p 7000 shutdown
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

Service management file for Redis service 2

cat /etc/systemd/system/redis_7001.service
[Unit]
Description=Redis persistent key-value database
After=network.target

[Service]
ExecStart=/usr/bin/redis-server /etc/redis/cluster/7001/redis_7001.conf --supervised systemd
ExecStop=/bin/redis-cli -h 127.0.0.1 -p 7001 shutdown
Type=notify
User=redis
Group=root
RuntimeDirectory=/etc/redis/cluster/7001
RuntimeDirectoryMode=0755
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

Redis Service Status

Master Service
systemctl status redis_7000.service 
● redis_7000.service - Redis persistent key-value database
   Loaded: loaded (/etc/systemd/system/redis_7000.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-09-25 08:14:15 UTC; 30min ago
  Process: 2902 ExecStop=/bin/redis-cli -h 127.0.0.1 -p 7000 shutdown (code=exited, status=0/SUCCESS)
 Main PID: 2917 (redis-server)
   CGroup: /system.slice/redis_7000.service
           └─2917 /usr/bin/redis-server *:7000 [cluster]
systemd[1]: Starting Redis persistent key-value database...
redis-server[2917]: 2917:C 25 Sep 2019 08:14:15.752 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-server[2917]: 2917:C 25 Sep 2019 08:14:15.752 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=2917, just started
redis-server[2917]: 2917:C 25 Sep 2019 08:14:15.752 # Configuration loaded
redis-server[2917]: 2917:C 25 Sep 2019 08:14:15.752 * supervised by systemd, will signal readiness
systemd[1]: Started Redis persistent key-value database.
redis-server[2917]: 2917:M 25 Sep 2019 08:14:15.754 * No cluster configuration found, I'm ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2
redis-server[2917]: 2917:M 25 Sep 2019 08:14:15.756 * Running mode=cluster, port=7000.
redis-server[2917]: 2917:M 25 Sep 2019 08:14:15.756 # Server initialized
redis-server[2917]: 2917:M 25 Sep 2019 08:14:15.756 * Ready to accept connections
Slave Service
systemctl status redis_7001.service
● redis_7001.service - Redis persistent key-value database
   Loaded: loaded (/etc/systemd/system/redis_7001.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-09-25 08:14:15 UTC; 30min ago
  Process: 2902 ExecStop=/bin/redis-cli -h 127.0.0.1 -p 7001 shutdown (code=exited, status=0/SUCCESS)
 Main PID: 2919 (redis-server)
   CGroup: /system.slice/redis_7001.service
           └─2919 /usr/bin/redis-server *:7001 [cluster]
systemd[1]: Starting Redis persistent key-value database...
redis-server[2919]: 2917:C 25 Sep 2019 08:14:15.752 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-server[2919]: 2917:C 25 Sep 2019 08:14:15.752 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=2917, just started
redis-server[2919]: 2917:C 25 Sep 2019 08:14:15.752 # Configuration loaded
redis-server[2919]: 2917:C 25 Sep 2019 08:14:15.752 * supervised by systemd, will signal readiness
systemd[1]: Started Redis persistent key-value database.
redis-server[2919]: 2917:M 25 Sep 2019 08:14:15.754 * No cluster configuration found, I'm ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2
redis-server[2919]: 2917:M 25 Sep 2019 08:14:15.756 * Running mode=cluster, port=7001.
redis-server[2919]: 2917:M 25 Sep 2019 08:14:15.756 # Server initialized
redis-server[2919]: 2917:M 25 Sep 2019 08:14:15.756 * Ready to accept connections

Redis Cluster Setup

Redis itself provides cli tool to setup cluster.
In the current 3 node scenario, I opt 7000 port on all node to serve Redis master and 7001 port to serve Redis slave.

redis-cli --cluster create 172.19.33.7:7000 172.19.42.44:7000 172.19.45.201:7000 172.19.33.7:7001 172.19.42.44:7001 172.19.45.201:7001 --cluster-replicas 1

The first 3 address will be the master and the next 3 address will be the slaves. It will be a cross node replication, say, Slave of any Mater will reside on a different node and the cluster-replicas define the replication factor, i.e each master will have 1 slave.

>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 172.19.42.44:7001 to 172.19.33.7:7000
Adding replica 172.19.45.201:7001 to 172.19.42.44:7000
Adding replica 172.19.33.7:7001 to 172.19.45.201:7000
M: ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 172.19.33.7:7000
   slots:[0-5460] (5461 slots) master
M: 314038a48bda3224bad21c3357dbff8305735d72 172.19.42.44:7000
   slots:[5461-10922] (5462 slots) master
M: 19a2c81b7f489bec35eed474ae8e1ad787327db6 172.19.45.201:7000
   slots:[10923-16383] (5461 slots) master
S: 896b2a7195455787b5d8a50966f1034c269c0259 172.19.33.7:7001
   replicates 19a2c81b7f489bec35eed474ae8e1ad787327db6
S: 89206df4f41465bce81f44e25e5fdfa8566424b8 172.19.42.44:7001
   replicates ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2
S: 20ab4b30f3d6d25045909c6c33ab70feb635061c 172.19.45.201:7001
   replicates 314038a48bda3224bad21c3357dbff8305735d72
Can I set the above configuration? (type 'yes' to accept):

A dry run will showcase the cluster setup and ask for confirmation.

Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
..
>>> Performing Cluster Check (using node 172.19.33.7:7000)
M: ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 172.19.33.7:7000
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 20ab4b30f3d6d25045909c6c33ab70feb635061c 172.19.45.201:7001
   slots: (0 slots) slave
   replicates 314038a48bda3224bad21c3357dbff8305735d72
M: 314038a48bda3224bad21c3357dbff8305735d72 172.19.42.44:7000
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: 19a2c81b7f489bec35eed474ae8e1ad787327db6 172.19.45.201:7000
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 89206df4f41465bce81f44e25e5fdfa8566424b8 172.19.42.44:7001
   slots: (0 slots) slave
   replicates ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2
S: 896b2a7195455787b5d8a50966f1034c269c0259 172.19.33.7:7001
   slots: (0 slots) slave
   replicates 19a2c81b7f489bec35eed474ae8e1ad787327db6
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Check Cluster Status

Connect to any of the cluster node to check the status of cluster.

redis-cli -c -h 172.19.33.7 -p 7000
172.19.33.7:7000> cluster nodes
20ab4b30f3d6d25045909c6c33ab70feb635061c 172.19.45.201:7001@17001 slave 314038a48bda3224bad21c3357dbff8305735d72 0 1569402961000 6 connected
314038a48bda3224bad21c3357dbff8305735d72 172.19.42.44:7000@17000 master - 0 1569402961543 2 connected 5461-10922
19a2c81b7f489bec35eed474ae8e1ad787327db6 172.19.45.201:7000@17000 master - 0 1569402960538 3 connected 10923-16383
ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 172.19.33.7:7000@17000 myself,master - 0 1569402959000 1 connected 0-5460
89206df4f41465bce81f44e25e5fdfa8566424b8 172.19.42.44:7001@17001 slave ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 0 1569402960000 5 connected
896b2a7195455787b5d8a50966f1034c269c0259 172.19.33.7:7001@17001 slave 19a2c81b7f489bec35eed474ae8e1ad787327db6 0 1569402959936 4 connected

Redis cluster itself manages the cross node replication, as seen in the above screen, 172.19.42.44:7000 master is associated with 172.19.45.201:7001 slave.

Data Sharding

There are 16384 slots. These slots are divided by the number of servers.
If there are 3 servers; 1, 2 and 3 then

  • Server 1 contains hash slots from 0 to 5500.
  • Server 2 contains hash slots from 5501 to 11000.
  • Server 3 contains hash slots from 11001 to 16383.
redis-cli -c -h 172.19.33.7 -p 7000
172.19.33.7:7000> set a 1
-> Redirected to slot [15495] located at 172.19.45.201:7000
OK
172.19.45.201:7000> set b 2
-> Redirected to slot [3300] located at 172.19.33.7:7000
OK
172.19.33.7:7000> set c 3
-> Redirected to slot [7365] located at 172.19.42.44:7000
OK
172.19.42.44:7000> set d 4
-> Redirected to slot [11298] located at 172.19.45.201:7000
OK
172.19.45.201:7000> get b
-> Redirected to slot [3300] located at 172.19.33.7:7000
"2"
172.19.33.7:7000> get a
-> Redirected to slot [15495] located at 172.19.45.201:7000
"1"
172.19.45.201:7000> get c
-> Redirected to slot [7365] located at 172.19.42.44:7000
"3"
172.19.42.44:7000> get d
-> Redirected to slot [11298] located at 172.19.45.201:7000
"4"
172.19.45.201:7000>

Redis Cluster Failover

Stop Master Service

Let’s stop the Redis master service on Server 3.

systemctl stop redis_7000.service
systemctl status redis_7000.service
● redis_7000.service - Redis persistent key-value database
   Loaded: loaded (/etc/systemd/system/redis_7000.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2019-09-25 09:32:37 UTC; 23s ago
  Process: 3232 ExecStop=/bin/redis-cli -h 127.0.0.1 -p 7000 shutdown (code=exited, status=0/SUCCESS)
  Process: 2892 ExecStart=/usr/bin/redis-server /etc/redis/cluster/7000/redis_7000.conf --supervised systemd (code=exited, status=0/SUCCESS)
 Main PID: 2892 (code=exited, status=0/SUCCESS)

Cluster State (Failover)

While checking the cluster status, Redis master service running on server 3 at port 7000 is shown fail and disconnected.

At the same moment its respective slave gets promoted to master which is running on port 7001 on server 1.

redis-cli -c -h 172.19.33.7 -p 7000
172.19.45.201:7000> CLUSTER NODES
314038a48bda3224bad21c3357dbff8305735d72 172.19.42.44:7000@17000 master,fail - 1569403957138 1569403956000 2 disconnected
ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 172.19.33.7:7000@17000 master - 0 1569404037252 1 connected 0-5460
896b2a7195455787b5d8a50966f1034c269c0259 172.19.33.7:7001@17001 slave 19a2c81b7f489bec35eed474ae8e1ad787327db6 0 1569404036248 4 connected
89206df4f41465bce81f44e25e5fdfa8566424b8 172.19.42.44:7001@17001 slave ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 0 1569404036752 5 connected
20ab4b30f3d6d25045909c6c33ab70feb635061c 172.19.45.201:7001@17001 master - 0 1569404036000 7 connected 5461-10922
19a2c81b7f489bec35eed474ae8e1ad787327db6 172.19.45.201:7000@17000 myself,master - 0 1569404035000 3 connected 10923-16383

Restarting Stopped Redis

Now we will check the behaviour of cluster once we fix or restart the redis service that we intentionally turned down earlier.

systemctl start redis_7000.service
systemctl status redis_7000.service
● redis_7000.service - Redis persistent key-value database
   Loaded: loaded (/etc/systemd/system/redis_7000.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-09-25 09:35:12 UTC; 8s ago
  Process: 3232 ExecStop=/bin/redis-cli -h 127.0.0.1 -p 7000 shutdown (code=exited, status=0/SUCCESS)
 Main PID: 3241 (redis-server)
   CGroup: /system.slice/redis_7000.service
           └─3241 /usr/bin/redis-server *:7000 [cluster]

Cluster State (Recovery)

Finally, all redis service are back in running state. The master service that we turned down and restarted has now become slave to its promoted master.

redis-cli -c -h 172.19.33.7 -p 7000
172.19.45.201:7000> CLUSTER NODES 314038a48bda3224bad21c3357dbff8305735d72 172.19.42.44:7000@17000 slave 20ab4b30f3d6d25045909c6c33ab70feb635061c 0 1569404162565 7 connected ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 172.19.33.7:7000@17000 master - 0 1569404162000 1 connected 0-5460 896b2a7195455787b5d8a50966f1034c269c0259 172.19.33.7:7001@17001 slave 19a2c81b7f489bec35eed474ae8e1ad787327db6 0 1569404163567 4 connected 89206df4f41465bce81f44e25e5fdfa8566424b8 172.19.42.44:7001@17001 slave ff3e4300bec02ed4bd1be9af5d83a5b44249c2b2 0 1569404163000 5 connected 20ab4b30f3d6d25045909c6c33ab70feb635061c 172.19.45.201:7001@17001 master - 0 1569404162000 7 connected 5461-10922 19a2c81b7f489bec35eed474ae8e1ad787327db6 172.19.45.201:7000@17000 myself,master - 0 1569404161000 3 connected 10923-16383

It’s not done yet, further we can explore around having a single endpoint to point from the application. I will am currently working on that and soon will come up with the solution.
Apart from this monitoring the Redis Cluster will also be a major aspect to look forward.
Till then get your hands dirty playing around the Redis Cluster setup and failover.

Reference links:
Image: Google image search (blog.advids.co)

One more reason to use Docker

Recently I was working on a project which includes Terraform and AWS stuff. While working on that I was using my local machine for terraform code testing and luckily everything was going fine. But when we actually want to test it for the production environment we got some issues there. Then, as usual, we started to dig into the issue and finally, we got the issue which was quite a silly one 😜. The production server Terraform version and my local development server Terraform version was not the same. 

After wasting quite a time on this issue, I decided to come up with a solution so this will never happen again.

But before jumping to the solution, let’s think is this problem was only related to Terraform or do we have faced the similar kind of issue in other scenarios as well.

Well, I guess we face a similar kind of issue in other scenarios as well. Let’s talk about some of the scenario’s first.

Suppose you have to create a CI pipeline for a project and that too with code re-usability. Now pipeline is ready and it is working fine in your project and then after some time, you have to implement the same kind of pipeline for the different project. Now you can use the same code but you don’t know the exact version of tools which you were using with CI pipeline. This will lead you to error elevation. 

Let’s take another example, suppose you are developing something in any of the programming languages. Surely that utility or program will have some dependencies as well. While installing those dependencies on the local system, it can corrupt your complete system or package manager for dependency management. A decent example is Pip which is a dependency manager of Python😉.

These are some example scenarios which we have faced actually and based on that we got the motivation for writing this blog.

The Solution

To resolve all this problem we just need one thing i.e. containers. I can also say docker as well but container and docker are two different things.

But yes for container management we use docker.

So let’s go back to our first problem the terraform one. If we have to solve that problem there are multiple ways to solve this. But we tried it to solve this using Docker.

As Docker says

Build Once and Run Anywhere

So based on this statement what we did, we created a Dockerfile for required Terraform version and stored it alongside with the code. Basically our Dockerfile looks like this:-

FROM alpine:3.8

MAINTAINER OpsTree.com

ENV TERRAFORM_VERSION=0.11.10

ARG BASE_URL=https://releases.hashicorp.com/terraform

RUN apk add --no-cache curl unzip bash \
    && curl -fsSL -o /tmp/terraform.zip ${BASE_URL}/${TERRAFORM_VERSION}/terraform_${TERRAFORM_VERSION}_linux_amd64.zip \
    && unzip /tmp/terraform.zip -d /usr/bin/

WORKDIR /opstree/terraform

USER opstree

In this Dockerfile, we are defining the version of Terraform which needs to run the code.
In a similar fashion, all other above listed problem can be solved using Docker. We just have to create a Dockerfile with exact dependencies which are needed and that same file can work in various environments and projects.

To take it to the next level you can also dump a Makefile as well to make everyone life easier. For example:-

IMAGE_TAG=latest
build-image:
    docker build -t opstree/terraform:${IMAGE_TAG} -f Dockerfile .

run-container:
    docker run -itd --name terraform -v ~/.ssh:/root/.ssh/ -v ~/.aws:/root/.aws -v ${PWD}:/opstree/terraform opstree/terraform:${IMAGE_TAG}

plan-infra:
    docker exec -t terraform bash -c "terraform plan"

create-infra:
    docker exec -t terraform bash -c "terraform apply -auto-approve"

destroy-infra:
    docker exec -t terraform bash -c "terraform destroy -auto-approve"

And trust me after making this utility available the reactions of the people who will be using this utility will be something like this:-

Now I am assuming you guys will also try to simulate the Docker in multiple scenarios as much as possible.

There are a few more scenarios which yet to be explored to enhance the use of Docker if you find that before I do, please let me know.

Thanks for reading, I’d really appreciate any and all feedback, please leave your comment below if you guys have any feedback.

Cheers till the next time.

Unix File Tree Part-2

For those who have surfed straight to this blog, please check out the previous part of this series Unix File Tree Part-1 and those who have stayed tuned for this part, welcome back.In the previous part, we discussed the philosophy and the need for file tree. In this part, we will dive deep into the significance of each directory.

Image result for horizontal file tree linux

Dayum!! that’s a lot of stuff to gulp at once, we’ll kick out things one after the other.

Major directories

Let’s talk about the crucial directories which play a major role.

  • /bin: When we started crawling on Linux this helped us to get on our feet yes, you read it right whether you want to copy any file, move it somewhere, create a directory, find out date, size of a file, all sorts of basic operations without which the OS won’t even listen to you (Linux yawning meanwhile) happens because of the executables present in this directory. Most of the programs in /bin are in binary format, having been created by a C compiler, but some are shell scripts in modern systems.
  • /etc: When you want things to behave the way you want, you go to /etc and put all your desired configuration there (Imagine if your girlfriend has an /etc life would have been easier). whether it is about various services or daemons running on your OS it will make sure things are working the way you want them to.
  • /var: He is the guy who has kept an eye over everything since the time you have booted the system (consider him like Heimdall from Thor). It contains files to which the system writes data during the course of its operation. Among the various sub-directories within /var are /var/cache (contains cached data from application programs), /var/games(contains variable data relating to games in /usr), /var/lib (contains dynamic data libraries and files), /var/lock (contains lock files created by programs to indicate that they are using a particular file or device), /var/log (contains log files), /var/run (contains PIDs and other system information that is valid until the system is booted again) and /var/spool (contains mail, news and printer queues).
  • /proc: You can think of /proc just like thoughts in your brain which are illusions and virtual. Being an illusionary file system it does not exist on disk instead, the kernel creates it in memory. It is used to provide information about the system (originally about processes, hence the name). If you navigate to /proc The first thing that you will notice is that there are some familiar-sounding files, and then a whole bunch of numbered directories. The numbered directories represent processes, better known as PIDs, and within them, a command that occupies them. The files contain system information such as memory (meminfo), CPU information (cpuinfo), and available filesystems.
  • /opt: It is like a guest room in your house where the guest stayed for prolong period and became part of your home. This directory is reserved for all the software and add-on packages that are not part of the default installation.
  • /usr: In the original Unix implementations, /usr was where the home directories of the users were placed (that is to say, /usr/someone was then the directory now known as /home/someone). In current Unixes, /usr is where user-land programs and data (as opposed to ‘system land’ programs and data) are. The name hasn’t changed, but its meaning has narrowed and lengthened from “everything user related” to “user usable programs and data”. As such, some people may now refer to this directory as meaning ‘User System Resources’ and not ‘user’ as was originally intended.

Potato or Potaaato what is the difference? 

We’ll be discussing those directories which confuse us always, which have almost a similar purpose but still are in separate locations and when asked about them we go like ummmm…….

/bin vs /usr/bin vs /sbin vs /usr/local/bin

This might get almost clear out when I explained the significance of /usr in the above paragraph. Since Unix designers planned /usr to be the local directories of individual users so it contained all of the sub-directories like /usr/bin, /usr/sbin, /usr/local/bin. But the question remains the same how the content is different?

/usr/bin:

  • /usr/bin is a standard directory on Unix-like operating systems that contains most of the executable files that are not needed for booting or repairing the system. 
  • A few of the most commonly used are awk, clear, diff, du, env, file, find, free, gzip, less, locate, man, sudo, tail, telnet, time, top, vim, wc, which, and zip.

/usr/sbin:

  • The /usr/sbin directory contains non-vital system utilities that are used after booting.
  • This is in contrast to the /sbin directory, whose contents include vital system utilities that are necessary before the /usr directory has been mounted (i.e., attached logically to the main filesystem). 
  • A few of the more familiar programs in /usr/sbin are adduser, chroot, groupadd, and userdel. 
  • It also contains some daemons, which are programs that run silently in the background, rather than under the direct control of a user, waiting until they are activated by a particular event or condition such as crond and sshd.

I hope I have covered most of the directories which you might come across frequently and your questions must have been answered.
Now that we know about the significance of each UNIX directory, It’s time to use them wisely the way they are supposed to be.
Please feel free to reach me out for any suggestions.
Goodbye till next time!

References: https://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/usr.htmlhttps://askubuntu.com/questions/130186/what-is-the-rationale-for-the-usr-directoryhttps://askubuntu.com/questions/308045/differences-between-bin-sbin-usr-bin-usr-sbin-usr-local-bin-usr-localhttp://index-of.es/Varios-2/How%20Linux%20Works%20What%20Every%20Superuser%20Should%20Know.pdf
https://imgflip.com/memegenerator