Initially, we had the DevOps framework in which Development and Operation team collaborated to create an agile development ecosystem. Then a new wave came with the name of “DevSecOps” in which we integrated the security into the existing DevOps process. But nowadays a new terminology “GitOps” is getting famous because of its “Single Source of Truth” nature. Its fame has reached to this level that it was a trending topic at KubeCon.
Git is basically a file-system where you can retrieve your content through addresses. It simply means that you can insert any kind of data into git for which Git will hand you back a unique key you can use later to retrieve that content. We would be learning #gitinsideout through this blog
The Git object model has three types: blobs (for files), trees (for folder) and commits.
Objects are immutable (they are added but not changed) and every object is identified by its unique SHA-1 hash
A blob is just the contents of a file. By default, every new version of a file gets a new blob, which is a snapshot of the file (not a delta like many other versioning systems).
A tree is a list of references to blobs and trees.
A commit is a reference to a tree, a reference to parent commit(s) and some decoration (message, author).
Then there are branches and tags, which are typically just references to commits.
Git stores the data in our .git/objects directory. After initialising a git repository, it automatically creates .git/objects/pack and .git/objects/info with no regular file. After pushing some files, it would reflect in the .git/objects/ folder
blob stores the content of a file and we can check its content by command
git cat-file -p
or git show
A commit is defined by tree, parent, author, committer, comment
All three objects ( blob,Tree,Commit) are explained in details with the help of a pictorial diagram.
Often we make changes to our code and push it to SCM. I was doing it once and made multiple changes, I was thinking it would be great if I could see the details of changes through local repository itself instead to go to a remote repository server. That pushed me to explore Git more deeply.
I just created a local remote repository with the help of git bare repository. Made some changes and tracked those changes(type, content, size etc).
Below example will help you understand the concept behind it.
Suppose we have cloned a repository named kunal:
Inside the folder where we have cloned the repository, go to the folder kunal then:
I have added content(hello) to readme.md and made many changes into the same repository as:
adding 2 files modifying one
Go to the refer folder inside .git and take the SHA value for the master head:
This commit object we can explore further with the help of cat-file which will show the type and content of tree and commit object:
Now we can see a tree object inside the tree object. Further, we can see the details for the tree object which in turn contains a blob object as below:
Below is the pictorial representation for the same:
|More elaborated representation for the same :|
Below are the commands for checking the content, type and size of objects( blob, tree and commit)
We can find the details of objects( size,type,content) with the help of #git cat-file
git-cat-file:- Provide content, type or size information for repository objects
You an verify the content of commit object and its type with git cat-file as below:
kunal@work:/home/git/test/kunal/.git # cat logs/refs/heads/master
Checking the content of a blob object(README.md, kunal and sandy)
As we can see first one is adding read me , so it is giving null parent(00000…000) and its unique SHA-1 is 912a4e85afac3b737797b5a09387a68afad816d6
Below are the details that we can fetch from above SHA-1 with the help of git cat-file :
Consider one example of merge:
Created a test branch and made changes and merged it to master.
Here you can notice we have two parents because of a merge request
You can further see the content, size, type of repository #gitobjects like:
This is pretty lengthy article but I’ve tried to make it as transparent and clear as possible. Once you work through the article and understand all concepts I showed here you will be able to work with Git more effectively.
This explanation gives the details regarding tree data structure and internal storage of objects. You can check the content (differences/commits)of the files through local .git repository which stores each object with unique SHA hash. This would clear basically the internal working of git.
Hopefully, this blog would help you in understanding the git inside out and helps in troubleshooting things related to git.
Rocket Science has always fascinated me, but one thing which totally blows my mind is the concept of modules aka. modular rockets. The literal definition of modules states “A modular rocket is a type of multistage rocket which features components that can be interchanged for specific mission requirements.” In simple terms, you can say that the Super Rocket depends upon those Submodules to get the things done.
Similarly is the case in the Software world, where super projects have multiple dependencies on other objects. And if we talk about managing projects Git can’t be ignored, Moreover Git has a concept of Submodules which is slightly inspired by the amazing rocket science of modules.
Hour of Need
Being a DevOps Specialist we need to do provisioning of the Infrastructure of our clients which is sometimes common for most of the clients. We decided to Automate it, which a DevOps is habitual of. Hence, Opstree Solutions initiated an Internal project named OSM. In which we create Ansible Roles of different opensource software with the contribution of each member of our organization. So that those roles can be used in the provisioning of the client’s infrastructure.
This makes the client projects dependent on our OSM. Which creates a problem statement to manage all dependencies which might get updated over the period. And to do that there is a lot of copy paste, deleting the repository and cloning them again to get the updated version, which is itself a hair-pulling task and obviously not the best practice.
Here comes the git-submodule as a modular rocket to take our Super Rocket to its destination.
Let’s Liftoff with Git-Submodules
In simple terms, a submodule is a git repository inside a Superproject’s git repository, which has its own .git folder which contains all the information that is necessary for your project in version control and all the information about commits, remote repository address etc. It is like an attached repository inside your main repository, which can be used to reuse a code inside it as a “module“.
Let’s get a practical use case of submodules.
We have a client let’s call it “Armstrong” who needs few of our ansible roles of OSM for their provisioning of Infrastructure. Let’s have a look at their git repository below.
With the above command, we are adding a submodule named osm_java whose URL is email@example.com:oosm/osm_java.git and branch is armstrong. The name of the branch is coined armstrong because to keep the configuration of each of our client’s requirement isolated, we created individual branches of OSM’s repositories on the basis of client name.
Now if take a look at our superproject provisioner we can see a file named .gitmodules which has the information regarding the submodules.
Here you can clearly see that a submodule osm_java has been attached to the superproject provisioner.
What if there was no submodule?
If that was a case, then we need to clone the repository from osm and paste it to the provisioner then add & commit it to the provisioner phew….. that would also have worked.
But what if there is some update has been made in the osm_java which have to be used in provisioner, we can not easily sync with the OSM. We would need to delete osm_java, again clone, copy, and paste in the provisioner which sounds clumsy and not a best way to automate the process.
Being a osm_java as a submodule we can easily update that this dependency without messing up the things.
By using the above update command we have successfully updated the submodule which actually pulled the changes from OSM’s origin armstrong branch.
What have we learned?
What is Gitolite?
where vagrant is the user of my virtual machine & its IP is 192.168.0.20
Now we will install & create a gitolite user on remote machine which will be hosting gitolite.
2 nitin nitin 4096 Jan 10 17:52 conf/
2 nitin nitin 4096 Jan 9 13:43 keydir/
# Group name & members
@admin = nitin
where ‘@’ denotes the user group i.e @staff is a group & jatin, james are the users of this group and these names must be similar to the key name stored in keydir directory.
For example “jatin” user must have the public key named “jatin.pub”
Let’s have a quick test of our setup
4 (delta 0), reused 0 (delta 0)
master -> master
- Maven-release plugin creates .backup and release.properties files to your working directory which can be committed mistakenly, when they should not be. jgit-flow maven plugin doesn’t create these files or any other file in your working directory.
- Maven-release plugin create two tags.
- Maven-release plugin does a build in the prepare goal and a build in the perform goal causing tests to run 2 times but jgit-flow maven plugin builds project once so tests run only once.
- If something goes wrong during the maven plugin execution, It become very tough to roll it back, on the other hand jgit-flow maven plugin makes all changes into the branch and if you want to roll back just delete that branch.
- jgit-flow maven plugin doesn’t run site-deploy
- jgit-flow maven plugin provides option to turn on/off maven deployment
- jgit-flow maven plugin provides option to turn on/off remote pushes/tagging
- jgit-flow maven plugin keeps the master branch always at latest release version.
How to use Jgit-flow maven Plugin for Release
- Add the following lines in your pom.xml for source code management access
- Add these line to resolve the Jgit-flow maven plugin and put the other option that will be required during the build
com.atlassian.maven.plugins maven-jgitflow-plugin 1.0-m4.3 true false true true true true true true master-test deploy-test
- Add the following lines in your pom.xml for source code management access
Above code snippet will perform following steps:
- Maven will resolve the jgitflow plug-in dependency
- In the configuration section, we describe how jgit-flow plug-in will behave.
- pushRelease XML tag to enable and disable jgit-flow from releasing the intermediate branches into the git or not.
- keepBranch XML tag to enable and disable the plug-in for keep the intermediate branch or not.
- noTag XMl tag to enable and disable the plug-in to create the that tag in git.
- allowUntracked XML tag to whether allow untracked file during the checking.
- flowInitContext XML tag is used to override the default and branch name of the jgit-flow plug-in
- In above code snippet, there is only two branches, master from where that code will be pulled and a intermediate branch that will be used by the jgit-flow plug-in. as I have discussed that jgit-flow plug-in uses the branches to keep it records. so development branch will be created by the plug-in that resides in the local not remotely, to track the release version etc.
- To put your releases into the repository manager add these lines
<distributionManagement> <repository> <id><auth id></id> <url><repo url of repository managers></url> </repository> <snapshotRepository> <id><auth id></id> <url><repo url of repository managers></url> </snapshotRepository> </distributionManagement>
- Put the following lines into your m2/settings.xml with your repository manager credentials
<settings> <servers> <server> <id><PUT THE ID OF THE REPOSITORY OR SNAPSHOTS ID HERE></id> <username><USERNAME></username> <password><PASSWORD></password> </server> </servers> </settings>
Start Release jgit-flow maven plugin command
Finish Release jgit-flow maven plugin command
For a example I have created a repository in github.com. for testing and two branch master-test and deploy-test. It is assumed that you have configured maven and git your system.
This command will take input from you for release version and create a release branch with release/. then it will push this release branch into github repository for temporarily because we are not saving the intermediate branched
Now At the end run this command
$ mvn -Dmaven.test.skip=true jgitflow:release-finish
after finishing this command it will delete release/ from local and remote.
Now you can check the changes in pom file by jgitflow. in the above snapshot, it is master-test branch, you can see in the tag it has removed the snapshot and also increased the version. It hold the current version of the application.
And in the deploy-test branch it show you new branch on which developers are working on
What we intend to do
What all we will be doing to achieve it
- Finalize a SCM tool that we are going to use puppet/chef/ansible.
- Automated setup of Jenkins using SCM tool.
- Automated setup of Nexus/Artifactory/Archiva using SCM tool.
- Automated setup of Sonar using SCM tool.
- Dev Environment setup using SCM tool: Since this is a web app project so our Devw443 environment will have Nginx & tomcat.
- QA Environment setup using SCM tool: Since this is a web app project so our QA environment will have Nginx & tomcat.
- Creation of various build jobs
- Code Stability Job.
- Code Quality Job.
- Code Coverage Job.
- Functional Test Job on dev environment.
- Creation of release Job.
- Creation of deployment job to do deployment on Dev & QA environment.
The reason behind this issue is that if you are using git with ssh protocol it tries to use your private key to perform git operations over ssh protocol & the location it expects is the .ssh folder at home directory of user. To fix this issue you have to create a HOME environment variable and point to your home directory where your .ssh folder exists after that restart Jenkins & now it should work fine.
As promised I’m back with the summary of cool stuff that I’ve done with my team in Build & Release domain to help us deal with day to day problems in efficient & effective way. As I said this month was about creating tools/utilities that sounds very simple but overall their impact in productivity & agility of build release teams and tech verticals was awesome :).
Automated deployment of Artifacts : If you have ever worked with a set of maven based projects that are interdependent on each other, one of the major problem that you will face in such a setup is to have the latest dependencies in your local system. Here I’m assuming two things you would be using a Maven Repo to host the artifacts & the dependencies would be SNAPSHOT dependencies if their is active development going on dependencies as well. Now the manual way of making sure that maven repo will always have the latest SNAPSHOT version is that every-time somebody does change in the code-base he/she manually deploy that artifact to maven repo. What we have done is that for each & every project we have created a Jenkins job that check if code is checked in for a specific component & if so that component’s SNAPSHOT version get’s deployed to maven repo. The impact of these utilities jobs was huge as now all the developers doesn’t have to focus on deploying their code to maven repo & also keeping track of who last committed the code was also not needed.
Log Parser Utility : We have done further improvement in our event based log analyzer utility. Now we also have a simple log parser utility through which we can parse the logs of a specific component & segregate the logs as per ERROR/WARN/INFO. Most importantly it is integrated with jenkins so you can go to jenkins select a component whose log needs to be analyzed, once analysis is finished the logs are segregated as per our configuration(in our case it is ERROR/WARN/INFO) after that in the left bar these segregations are shown with all the various instances of these categories and user can click on those links to go exactly at the location where that information is present in logs
Auto Code Merge : As I already told we have a team of around 100+ developers & a sprint cycle of 10 days and two sprints overlap each other for 5 days i.e first 5 days for development after tat code freeze is enforced and next 5 days are for bug fixing which means that at a particular point of time there are 3 parallel branches on which work is under progress one branch which is currently deployed in production second branch on which testing is happening and third branch on which active development is happening. You can easily imagine that merging these branches is a task in itself. What we have done is to create an automated code merge utility that tries to merge branches in a per-defined sequence if automatic merge is successful the merge proceeds for next set of branches otherwise a mail is sent to respective developers whose files are in conflict mode
Hope you will get motivated by these set of utilities & come up with new suggestions or point of improvements
In our current setup the problem is more critical as development of two releases overlap with each other . We have a sprint cycle of 10 days where we have 5 days of active development after that code freeze is implemented & rest 5 days are only for big fixes. The next sprint starts just after the code freeze date of previous release. In ideal scenario this setup should work well but the biggest assumption behind successful execution of the process is their should be minimum code check-ins after code freeze & usually that doesn’t happens. This results in parallel development in 2 branches & therefore while merging two branches their are lot of code conflicts.
The real problem starts when we start merging code, as currently their are close to 100 developers working on the same code-base which means a huge list of files in conflict mode & you have to chase down each & every person to resolve those conflicts. To overcome the above-said problem we are planning to do 2 things.
First one is instead of doing merge after a long duration we are planning to increase the frequency of merge from once in 5 day to twice a day which would help us to reduce the list of conflicting files.
As I always strive to automate things as much as possible, the second part is to at-least create an automated tool that will perform a dummy merge of two branches and list out all the files that would result in conflict mode along with listing the last user’s who have modified the files in respective branch.
We are expecting 60-70% efficiency in code merge process, let’s see how things goes. Feel free to drop any ideas if you have or in case of any concerns :).
Although I tried to be as generic as possible, but just to let you know we are using Git as version control system.
First solution is to fix the issue in issue branch and merge the issue branch again with master. This solution can work but the problem with this approach is that till the time you haven’t fixed the issue introduced due to issue branch and merged it back with master branch you can not create a new issue-branch and much bigger problem is you can’t release.
The second solution is if we can somehow revert the merge of issue branch with main branch and then fix the issue in issue-branch, after that merging the fixed issue branch with the main branch. This approach seems to be straight forward and more logical. Git comes up with a cool command git-revert which can revert existing commits and even revert merge of another branch. I’ll talk about the solution in the next blog 😉