Why GitOps is so exciting?

Initially, we had the DevOps framework in which Development and Operation team collaborated to create an agile development ecosystem. Then a new wave came with the name of “DevSecOps” in which we integrated the security into the existing DevOps process. But nowadays a new terminology “GitOps” is getting famous because of its “Single Source of Truth” nature. Its fame has reached to this level that it was a trending topic at KubeCon.

Continue reading “Why GitOps is so exciting?”

Git Inside Out

Git Inside-Out
Man Wearing Black and White Stripe Shirt Looking at White Printer Papers on the Wall

Git is basically a file-system where you can retrieve your content through addresses. It simply means that you can insert any kind of data into git for which Git will hand you back a unique key you can use later to retrieve that content. We would be learning #gitinsideout through this blog

The Git object model has three types: blobs (for files), trees (for folder) and commits. 

Objects are immutable (they are added but not changed) and every object is identified by its unique SHA-1 hash
A blob is just the contents of a file. By default, every new version of a file gets a new blob, which is a snapshot of the file (not a delta like many other versioning systems).
A tree is a list of references to blobs and trees.
A commit is a reference to a tree, a reference to parent commit(s) and some decoration (message, author).

Then there are branches and tags, which are typically just references to commits.

Git stores the data in our .git/objects directory. After initialising a git repository, it automatically creates .git/objects/pack and .git/objects/info with no regular file. After pushing some files, it would reflect in the .git/objects/ folder

OBJECT Blob

blob stores the content of a file and we can check its content by command

git cat-file -p

or git show

OBJECT Tree

A tree is a simple object that has a bunch of pointers to blobs and other trees – it generally represents the contents of a directory or sub-directory.

We can use git ls-tree to list the content of the given tree object

OBJECT Commit

The “commit” object links a physical state of a tree with a description of how we got there and why.

A commit is defined by tree, parent, author, committer, comment

All three objects ( blob,Tree,Commit) are explained in details with the help of a pictorial diagram.

Often we make changes to our code and push it to SCM. I was doing it once and made multiple changes, I was thinking it would be great if I could see the details of changes through local repository itself instead to go to a remote repository server. That pushed me to explore Git more deeply.

I just created a local remote repository with the help of git bare repository. Made some changes and tracked those changes(type, content, size etc).

Below example will help you understand the concept behind it.

Suppose we have cloned a repository named kunal:

Inside the folder where we have cloned the repository, go to the folder kunal then:

cd kunal/.git/

I have added content(hello) to readme.md and made many changes into the same repository as:

adding README.md

updating Readme.md

adding 2 files modifying one

pull request

commit(adding directory).

Go to the refer folder inside .git and take the SHA value for the master head:

This commit object we can explore further with the help of cat-file which will show the type and content of tree and commit object:

Now we can see a tree object inside the tree object. Further, we can see the details for the tree object which in turn contains a blob object as below:

Below is the pictorial representation for the same:

More elaborated representation for the same :

Below are the commands for checking the content, type and size of objects( blob, tree and commit)

kunal@work:/home/git/test/kunal# cat README.md
hello

We can find the details of objects( size,type,content) with the help of #git cat-file

git-cat-file:- Provide content, type or size information for repository objects

You an verify the content of commit object and its type with git cat-file as below:

kunal@work:/home/git/test/kunal/.git # cat logs/refs/heads/master

Checking the content of a blob object(README.md, kunal and sandy)

As we can see first one is adding read me , so it is giving null parent(00000…000) and its unique SHA-1 is 912a4e85afac3b737797b5a09387a68afad816d6

Below are the details that we can fetch from above SHA-1 with the help of git cat-file :

Consider one example of merge:

Created a test branch and made changes and merged it to master.

Here you can notice we have two parents because of a merge request

You can further see the content, size, type of repository #gitobjects like:

Summary

This is pretty lengthy article but I’ve tried to make it as transparent and clear as possible. Once you work through the article and understand all concepts I showed here you will be able to work with Git more effectively.

This explanation gives the details regarding tree data structure and internal storage of objects. You can check the content (differences/commits)of the files through local .git repository which stores each object with unique  SHA  hash. This would clear basically the internal working of git.
Hopefully, this blog would help you in understanding the git inside out and helps in troubleshooting things related to git.

Git-Submodule

Rocket Science has always fascinated me, but one thing which totally blows my mind is the concept of modules aka. modular rockets. The literal definition of modules statesA modular rocket is a type of multistage rocket which features components that can be interchanged for specific mission requirements.” In simple terms, you can say that the Super Rocket depends upon those Submodules to get the things done.
Similarly is the case in the Software world, where super projects have multiple dependencies on other objects. And if we talk about managing projects Git can’t be ignored, Moreover Git has a concept of Submodules which is slightly inspired by the amazing rocket science of modules.

Hour of Need

Being a DevOps Specialist we need to do provisioning of the Infrastructure of our clients which is sometimes common for most of the clients. We decided to Automate it, which a DevOps is habitual of. Hence, Opstree Solutions initiated an Internal project named OSM. In which we create Ansible Roles of different opensource software with the contribution of each member of our organization. So that those roles can be used in the provisioning of the client’s infrastructure.
This makes the client projects dependent on our OSM. Which creates a problem statement to manage all dependencies which might get updated over the period. And to do that there is a lot of copy paste, deleting the repository and cloning them again to get the updated version, which is itself a hair-pulling task and obviously not the best practice.
Here comes the git-submodule as a modular rocket to take our Super Rocket to its destination.

Let’s Liftoff with Git-Submodules

A submodule is a repository embedded inside another repository. The submodule has its own history; the repository it is embedded in is called a superproject.

In simple terms, a submodule is a git repository inside a Superproject’s git repository, which has its own .git folder which contains all the information that is necessary for your project in version control and all the information about commits, remote repository address etc. It is like an attached repository inside your main repository, which can be used to reuse a code inside it as a “module“.
Let’s get a practical use case of submodules.
We have a client let’s call it “Armstrong” who needs few of our ansible roles of OSM for their provisioning of Infrastructure. Let’s have a look at their git repository below.

$    cd provisioner
$    ls -a
     .  ..  ansible  .git  inventory  jenkins  playbooks  README.md  roles
$    cd roles
$    ls -a
     apache  java   nginx  redis  tomcat
We can see in this Armstrong’s provisioner repository(a git repository) depends upon five roles which are available in OSM’s repository to help Armstrong to provision their infrastructure. So we’ll add submodules osm_java and others.

$    cd java
$    git submodule add -b armstrong git@gitlab.com:oosm/osm_java.git osm_java
     Cloning into './provisioner/roles/java/osm_java'...
     remote: Enumerating objects: 23, done.
     remote: Counting objects: 100% (23/23), done.
     remote: Compressing objects: 100% (17/17), done.
     remote: Total 23 (delta 3), reused 0 (delta 0)
     Receiving objects: 100% (23/23), done.
     Resolving deltas: 100% (3/3), done.

With the above command, we are adding a submodule named osm_java whose URL is git@gitlab.com:oosm/osm_java.git and branch is armstrong. The name of the branch is coined armstrong because to keep the configuration of each of our client’s requirement isolated, we created individual branches of OSM’s repositories on the basis of client name.
Now if take a look at our superproject provisioner we can see a file named .gitmodules which has the information regarding the submodules.

$    cd provisioner
$    ls -a
     .  ..  ansible  .git  .gitmodules  inventory  jenkins  playbooks  README.md  roles
$    cat .gitmodules
     [submodule "roles/java/osm_java"]
     path = roles/java/osm_java
     url = git@gitlab.com:oosm/osm_java.git
     branch = armstrong

Here you can clearly see that a submodule osm_java has been attached to the superproject provisioner.

What if there was no submodule?

If that was a case, then we need to clone the repository from osm and paste it to the provisioner then add & commit it to the provisioner phew….. that would also have worked.
But what if there is some update has been made in the osm_java which have to be used in provisioner, we can not easily sync with the OSM. We would need to delete osm_java, again clone, copy, and paste in the provisioner which sounds clumsy and not a best way to automate the process.
Being a osm_java as a submodule we can easily update that this dependency without messing up the things.

$    git submodule status
     -d3bf24ff3335d8095e1f6a82b0a0a78a5baa5fda roles/java/osm_java
$    git submodule update --remote
     remote: Enumerating objects: 3, done.
     remote: Counting objects: 100% (3/3), done.
     remote: Total 2 (delta 0), reused 2 (delta 0), pack-reused 0
     Unpacking objects: 100% (2/2), done.
     From git@gitlab.com:oosm/osm_java.git     0564d78..04ca88b  armstrong     -> origin/armstrong
     Submodule path 'roles/java/osm_java': checked out '04ca88b1561237854f3eb361260c07824c453086'

By using the above update command we have successfully updated the submodule which actually pulled the changes from OSM’s origin armstrong branch.

What have we learned? 

In this blog post, we learned to make use of git-submodules to keep our dependent repositories updated with our super project, and not getting our hands dirty with gullible copy and paste.
Kick-off those practices which might ruin the fun, sit back and enjoy the automation.

Referred links:
Image: google.com
Documentation: https://git-scm.com/docs/gitsubmodules


Gitolite

 

Requirement

We need private git repositories for internally use in our project so we use Gitolite for this requirement. Our client has a lot of consultants, partners and short term employees working with their code so they needed a good way of controlling access to the repos and preferably without giving each of them a unix user on the server where the repo is hosted.

What is Gitolite?

Gitolite is basically an access layer on top of Git. Users are granted access to repos via a simple config file and we as an admin only needs the users public SSH key and a username from the user. Gitolite uses this to grant or deny access to our Git repositories. And it does this via a git repository named gitolite-admin.

Installation

We need a public key and a Gitolite user through which we will setup the Gitolite.

In this case I have used my base machine(Ubuntu) public key so that only my machine can manage Gitolite.

Now we will copy this public key to a virtual machine

$ scp ~/.ssh/gitolite.pub git@192.168.0.20:/home/git

 

where vagrant is the user of my virtual machine & its IP is 192.168.0.20

Now we will install & create a gitolite user on remote machine which will be hosting gitolite.

root@git:~# apt-get install gitolite3
 
root@git:~# adduser gitolite
 
Now we need to remove password of gitolite user from below command
 
root@git:~# passwd -d gitolite
 
 
Let’s move & change the ownership of this public key.
root@git:~# mv gitolite.pub /home/gitolite/
root@git:~# chown gitolite:gitolite /home/gitolite/gitolite.pub
 
Become the gitolite user
 
root@git:~# su – gitolite
 
Now setup the gitolite with the public key
 
gitolite@git:~# gitolite setup -pk gitolite.pub
 
Now to manage the repositories, users and access-rights we will download the gitolite-admin(git repository) to our base machine.
 
$ git clone gitolite@192.168.0.20:gitolite-admin
$ cd gitolite-admin
$ ls -l
total
8
drwxr-xr-x
2 nitin nitin 4096 Jan 10 17:52 conf/
drwxr-xr-x
2 nitin nitin 4096 Jan  9 13:43 keydir/
 
where “keydir” is the directory where we store our user’s keys and that key name must be same as existing username on the system.
 
In conf directory there is a “gitolite.conf” file which controls which repositories are available on the system and who has which rights to those repositories.
We just need to add new repository name & users who will access it and this file will create the repo & grant the permission on it accordingly.
Let us explore my gitolite.conf file in which I have added a new repository called “opstreeblog”
$ cat conf/gitolite.conf

# Group name & members

@admin = nitin
@staff    = jatin james
 
# Gitolite admin repository

repo gitolite-admin
RW+   = gitolite @admin
 
# Read-Write permission to all the users on testing repo

repo testing
RW+    = @all
 
# Read-Write permission to user sandy & the admin group. And Read-Only access to staff group

repo opstreeblog
   RW+   = sandy @admin
   R         = @staff

where ‘@’ denotes the user group i.e @staff is a group & jatin, james are the users of this group and these names must be similar to the key name stored in keydir directory.
For example “jatin” user must have the public key named “jatin.pub”

Let’s have a quick test of our setup

$ git commit conf/gitolite.conf -m “added opstreeblog repo”
 
[master 357bbc8] added “opstreeblog” repo
 
1 files changed, 9 insertions(+), 1 deletions(-)
 
nitin@Latitude-3460:~/gitolite-admin$ git push origin master
 
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 428 bytes, done.
Total
4 (delta 0), reused 0 (delta 0)
remote: Initialized empty Git repository in /home/gitolite/repositories/opstreeblog.git/
To gitbox:gitolite-admin d595439..357bbc8
master -> master
 
I hope that gives you a good overview of how to install and manage Gitolite.

jgit-flow maven plugin to Release Java Application

Introduction

As a DevOps I need a smooth way to release the java application, so I compared two maven plugin that are used to release the java application and in the end I found that Jgit-flow plugin is far better than maven-release plugin on the basis of following points:
  • Maven-release plugin creates .backup and release.properties files to your working directory which can be committed mistakenly, when they should not be. jgit-flow maven plugin doesn’t create these files or any other file in your working directory.
  • Maven-release plugin create two tags.
  • Maven-release plugin does a build in the prepare goal and a build in the perform goal causing tests to run 2 times but jgit-flow maven plugin builds project once so tests run only once.
  • If something goes wrong during the maven plugin execution, It become very tough to roll it back, on the other hand jgit-flow maven plugin makes all changes into the branch and if you want to roll back just delete that branch.
  • jgit-flow maven plugin doesn’t run site-deploy
  • jgit-flow maven plugin provides option to turn on/off maven deployment
  • jgit-flow maven plugin provides option to turn on/off remote pushes/tagging
  • jgit-flow maven plugin keeps the master branch always at latest release version.
Now let’s see how to integrate Jgit-flow maven plugin and use it

How to use Jgit-flow maven Plugin for Release

Follow the flowing steps
    1. Add the following lines in your pom.xml for source code management access
      
         scm:git:
        scm:git:git:
    2. Add these line to resolve the Jgit-flow maven plugin and put the other option that will be required during the build
      com.atlassian.maven.plugins
      maven-jgitflow-plugin
      1.0-m4.3
              
      true
      false
      true
      true
      true
      true
      true
      true
                
      master-test
      deploy-test       

Above code snippet will perform following steps:

    • Maven will resolve the jgitflow plug-in dependency
    • In the configuration section, we describe how jgit-flow plug-in will behave.
    • pushRelease XML tag to enable and disable jgit-flow from releasing the intermediate branches into the git or not.
    • keepBranch XML tag to enable and disable the plug-in for keep the intermediate branch or not.
    • noTag XMl tag to enable and disable the plug-in to create the that tag in git.
    • allowUntracked XML tag to whether allow untracked file during the checking.
    • flowInitContext XML tag is used to override the default and branch name of the jgit-flow plug-in
    • In above code snippet, there is only two branches, master from where that code will be pulled and a intermediate branch that will be used by the jgit-flow plug-in. as I have discussed that jgit-flow plug-in uses the branches to keep it records. so development branch will be created by the plug-in that resides in the local not remotely, to track the release version etc.
  1. To put your releases into the repository manager add these lines
    <distributionManagement>
      <repository>
        <id><auth id></id>
        <url><repo url of repository managers></url>
      </repository>
      <snapshotRepository>
        <id><auth id></id>
        <url><repo url of repository managers></url>
      </snapshotRepository>
    </distributionManagement>
  2. Put the following lines into your m2/settings.xml with your repository manager credentials
    <settings>
      <servers>
        <server>
            <id><PUT THE ID OF THE REPOSITORY OR SNAPSHOTS ID HERE></id>
           <username><USERNAME></username>
           <password><PASSWORD></password>
        </server>
      </servers>
    </settings>

Start Release jgit-flow maven plugin command

To start the new release execute jgitflow:release-start.

Finish Release jgit-flow maven plugin  command

To finish new release, execute mvn jgitflow:release-finish.
For a example I have created a repository in github.com. for testing and two branch master-test and deploy-test. It is assumed that you have configured maven and git your system.

In the deploy-test branch run following command
$ mvn clean -Dmaven.test.skip=true install jgitflow:release-start

This command will take input from you for release version and create a release branch with release/. then it will push this release branch into github repository for temporarily because we are not saving the intermediate branched

Now At the end run this command
$ mvn -Dmaven.test.skip=true jgitflow:release-finish
after finishing this command it will delete release/ from local and remote.

Now you can check the changes in pom file by jgitflow. in the above snapshot, it is master-test branch, you can see in the tag it has removed the snapshot and also increased the version.  It hold the current version of the application.

And in the deploy-test branch it show you new branch on which developers are working on

Opstree SHOA Part 1: Build & Release

At Opstree we have started a new initiative called SHOA, Saturday Hands On Activity. Under this program we pick up a concept, tool or technology and do a hands on activity. At the end of the day whatever we do is followed by a blog or series of blog that we have understood during the day.
 
Since this is the first Hands On Activity so we are starting with Build & Release.

What we intend to do 

Setup Build & Release for project under git repository https://github.com/OpsTree/ContinuousIntegration.

What all we will be doing to achieve it

  • Finalize a SCM tool that we are going to use puppet/chef/ansible.
  • Automated setup of Jenkins using SCM tool.
  • Automated setup of Nexus/Artifactory/Archiva using SCM tool.
  • Automated setup of Sonar using SCM tool.
  • Dev Environment setup using SCM tool: Since this is a web app project so our Devw443 environment will have Nginx & tomcat.
  • QA Environment setup using SCM tool: Since this is a web app project so our QA environment will have Nginx & tomcat.
  • Creation of various build jobs
    • Code Stability Job.
    • Code Quality Job.
    • Code Coverage Job.
    • Functional Test Job on dev environment.
  • Creation of release Job.
  • Creation of deployment job to do deployment on Dev & QA environment.
This activity is open for public as well so if you have any suggestion or you want to attend it you are most welcome

Tip : Setting up Git Jenkins integration on windows box

If you have ever tried setting up git as a version control system in a Jenkins installation on a windows box you would have faced an error message ssh key not available.

The reason behind this issue is that if you are using git with ssh protocol it tries to use your private key to perform git operations over ssh protocol & the location it expects is the .ssh folder at home directory of user. To fix this issue you have to create a HOME environment variable and point to your home directory where your .ssh folder exists after that restart Jenkins & now it should work fine.

Automation tips and tricks

As promised I’m back with the summary of cool stuff that I’ve done with my team in Build & Release domain to help us deal with day to day problems in efficient & effective way. As I said this month was about creating tools/utilities that sounds very simple but overall their impact in productivity & agility of build release teams and tech verticals was awesome :).

Automated deployment of Artifacts : If you have ever worked with a set of maven based projects that are interdependent on each other, one of the major problem that you will face in such a setup is to have the latest dependencies in your local system. Here I’m assuming two things you would be using a Maven Repo to host the artifacts & the dependencies would be SNAPSHOT dependencies if their is active development going on dependencies as well. Now the manual way of making sure that maven repo will always have the latest SNAPSHOT version is that every-time somebody does change in the code-base he/she manually deploy that artifact to maven repo. What we have done is that for each & every project we have created a Jenkins job that check if code is checked in for a specific component & if so that component’s SNAPSHOT version get’s deployed to maven repo. The impact of these utilities jobs was huge as now all the developers doesn’t have to focus on deploying their code to maven repo & also keeping track of who last committed the code was also not needed.

Log Parser Utility : We have done further improvement in our event based log analyzer utility. Now we also have a simple log parser utility through which we can parse the logs of a specific component & segregate the logs as per ERROR/WARN/INFO. Most importantly it is integrated with jenkins so you can go to jenkins select a component whose log needs to be analyzed, once analysis is finished the logs are segregated as per our configuration(in our case it is ERROR/WARN/INFO) after that in the left bar these segregations are shown with all the various instances of these categories and user can click on those links to go exactly at the location where that information is present in logs

Auto Code Merge : As I already told we have a team of around 100+ developers & a sprint cycle of 10 days and two sprints overlap each other for 5 days i.e first 5 days for development after tat code freeze is enforced and next 5 days are for bug fixing which means that at a particular point of time there are 3 parallel branches on which work is under progress one branch which is currently deployed in production second branch on which testing is happening and third branch on which active development is happening. You can easily imagine that merging these branches is a task in itself. What we have done is to create an automated code merge utility that tries to merge branches in a per-defined sequence if automatic merge is successful the merge proceeds for next set of branches otherwise a mail is sent to respective developers whose files are in conflict mode

Hope you will get motivated by these set of utilities & come up with new suggestions or point of improvements

Efficiently handling Code merge in Version Control System

One of the painful & mundane task that release engineers have to perform is to merge changes of one branch into another branch & in case of code conflicts the release engineer has to co-ordinate with all the developers to resolve those merge conflicts.

In our current setup the problem is more critical as development of two releases overlap with each other . We have a sprint cycle of 10 days where we have 5 days of active development after that code freeze is implemented & rest 5 days are only for big fixes. The next sprint starts just after the code freeze date of previous release. In ideal scenario this setup should work well but the biggest assumption behind successful execution of the process is their should be minimum code check-ins after code freeze & usually that doesn’t happens. This results in parallel development in 2 branches & therefore while merging two branches their are lot of code conflicts.

The real problem starts when we start merging code, as currently their are close to 100 developers working on the same code-base which means a huge list of files in conflict mode & you have to chase down each & every person to resolve those conflicts. To overcome the above-said problem we are planning to do 2 things.

First one is instead of doing merge after a long duration we are planning to increase the frequency of merge from once in 5 day to twice a day which would help us to reduce the list of conflicting files.

As I always strive to automate things as much as possible, the second part is to at-least create an automated tool that will perform a dummy merge of two branches and list out all the files that would result in conflict mode along with listing the last user’s who have modified the files in respective branch.

We are expecting 60-70% efficiency in code merge process, let’s see how things goes. Feel free to drop any ideas if you have or in case of any concerns :).

Although I tried to be as generic as possible, but just to let you know we are using Git as version control system.

Git : How to fix issues in a merged branch

Sometime back we faced a peculiar problem in our project where we have to fix some issue introduced due to a merged branch. To explain this problem first I’ve to explain the way we manage branches. As we entered in the maintenance mode of project there was a dire need of fixing issues and working on change requests in isolation. Whenever we need to fix an issue we cut a issue-branch out of main branch fix that issue in isolation and after verifying and making sure the issue is fixed we merge that issue-branch back in main branch. The issue we faced with the approach is that after merging an issue branch with the main branch sometimes we came across some regression bugs due to the issue branch. There were two possible solutions to overcome the problem.

First solution is to fix the issue in issue branch and merge the issue branch again with master. This solution can work but the problem with this approach is that till the time you haven’t fixed the issue introduced due to issue branch and merged it back with master branch you can not create a new issue-branch and much bigger problem is you can’t release.

The second solution is if we can somehow revert the merge of issue branch with main branch and then fix the issue in issue-branch, after that merging the fixed issue branch with the main branch. This approach seems to be straight forward and more logical. Git comes up with a cool command git-revert which can revert existing commits and even revert merge of another branch. I’ll talk about the solution in the next blog 😉