Unix File Tree Part-2

For those who have surfed straight to this blog, please check out the previous part of this series Unix File Tree Part-1 and those who have stayed tuned for this part, welcome back.In the previous part, we discussed the philosophy and the need for file tree. In this part, we will dive deep into the significance of each directory.

Image result for horizontal file tree linux

Dayum!! that’s a lot of stuff to gulp at once, we’ll kick out things one after the other.

Major directories

Let’s talk about the crucial directories which play a major role.

  • /bin: When we started crawling on Linux this helped us to get on our feet yes, you read it right whether you want to copy any file, move it somewhere, create a directory, find out date, size of a file, all sorts of basic operations without which the OS won’t even listen to you (Linux yawning meanwhile) happens because of the executables present in this directory. Most of the programs in /bin are in binary format, having been created by a C compiler, but some are shell scripts in modern systems.
  • /etc: When you want things to behave the way you want, you go to /etc and put all your desired configuration there (Imagine if your girlfriend has an /etc life would have been easier). whether it is about various services or daemons running on your OS it will make sure things are working the way you want them to.
  • /var: He is the guy who has kept an eye over everything since the time you have booted the system (consider him like Heimdall from Thor). It contains files to which the system writes data during the course of its operation. Among the various sub-directories within /var are /var/cache (contains cached data from application programs), /var/games(contains variable data relating to games in /usr), /var/lib (contains dynamic data libraries and files), /var/lock (contains lock files created by programs to indicate that they are using a particular file or device), /var/log (contains log files), /var/run (contains PIDs and other system information that is valid until the system is booted again) and /var/spool (contains mail, news and printer queues).
  • /proc: You can think of /proc just like thoughts in your brain which are illusions and virtual. Being an illusionary file system it does not exist on disk instead, the kernel creates it in memory. It is used to provide information about the system (originally about processes, hence the name). If you navigate to /proc The first thing that you will notice is that there are some familiar-sounding files, and then a whole bunch of numbered directories. The numbered directories represent processes, better known as PIDs, and within them, a command that occupies them. The files contain system information such as memory (meminfo), CPU information (cpuinfo), and available filesystems.
  • /opt: It is like a guest room in your house where the guest stayed for prolong period and became part of your home. This directory is reserved for all the software and add-on packages that are not part of the default installation.
  • /usr: In the original Unix implementations, /usr was where the home directories of the users were placed (that is to say, /usr/someone was then the directory now known as /home/someone). In current Unixes, /usr is where user-land programs and data (as opposed to ‘system land’ programs and data) are. The name hasn’t changed, but its meaning has narrowed and lengthened from “everything user related” to “user usable programs and data”. As such, some people may now refer to this directory as meaning ‘User System Resources’ and not ‘user’ as was originally intended.

Potato or Potaaato what is the difference? 

We’ll be discussing those directories which confuse us always, which have almost a similar purpose but still are in separate locations and when asked about them we go like ummmm…….

/bin vs /usr/bin vs /sbin vs /usr/local/bin

This might get almost clear out when I explained the significance of /usr in the above paragraph. Since Unix designers planned /usr to be the local directories of individual users so it contained all of the sub-directories like /usr/bin, /usr/sbin, /usr/local/bin. But the question remains the same how the content is different?

/usr/bin:

  • /usr/bin is a standard directory on Unix-like operating systems that contains most of the executable files that are not needed for booting or repairing the system. 
  • A few of the most commonly used are awk, clear, diff, du, env, file, find, free, gzip, less, locate, man, sudo, tail, telnet, time, top, vim, wc, which, and zip.

/usr/sbin:

  • The /usr/sbin directory contains non-vital system utilities that are used after booting.
  • This is in contrast to the /sbin directory, whose contents include vital system utilities that are necessary before the /usr directory has been mounted (i.e., attached logically to the main filesystem). 
  • A few of the more familiar programs in /usr/sbin are adduser, chroot, groupadd, and userdel. 
  • It also contains some daemons, which are programs that run silently in the background, rather than under the direct control of a user, waiting until they are activated by a particular event or condition such as crond and sshd.

I hope I have covered most of the directories which you might come across frequently and your questions must have been answered.
Now that we know about the significance of each UNIX directory, It’s time to use them wisely the way they are supposed to be.
Please feel free to reach me out for any suggestions.
Goodbye till next time!

References: https://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/usr.htmlhttps://askubuntu.com/questions/130186/what-is-the-rationale-for-the-usr-directoryhttps://askubuntu.com/questions/308045/differences-between-bin-sbin-usr-bin-usr-sbin-usr-local-bin-usr-localhttp://index-of.es/Varios-2/How%20Linux%20Works%20What%20Every%20Superuser%20Should%20Know.pdf
https://imgflip.com/memegenerator

Unix File Tree Part-1

Related image

Nature has its own way to reach out for perfection and the same should be our instinct to make our creations perfect.

Dennis Ritchie, father of Unix and an esteemed computer scientist might have implied the same approach for Unix directory structure.

Why?

Before getting into the hierarchy of Unix File Tree lets discuss why we need it. The need for a directory structure arises when multiple users are handling multiple software along with their dependent files. Let me explain this with a couple of scenarios.

Scenario-1:

Consider an ideal software or package which requires multiple files to function properly.

  • Binary files
  • Configuration files
  • Log files
  • Data files
  • Metadata files during execution
  • Libraries

 For now, let’s consider there is just one directory and I am keeping all of the dependent files in that directory. 

$ ls
package-1.binary  package-1.conf  package-1.data  package-1.lib  package-1.log  package-1.tmp

Another software comes in the picture which has its own dependent files.

$ ls
package-1.binary  package-1.data  package-1.log  package-2.binary  package-2.data  package-2.log
package-1.conf    package-1.lib   package-1.tmp  package-2.conf    package-2.lib   package-2.tmp

Things will get messy while dealing with various software since handling them won’t be easy and will lead to a chaotic situation.

Scenario-2:

Suppose I am a system admin and managing all of the software in the above scenario-1. To make things organized I created different directories to place the dependent files.
  • Binary files –> /dir-1
  • Configuration files –> /dir-2
  • Log files –> /dir-3
  • Data files –> /dir-4
  • Meta files –> /dir-5
  • Libraries –> /dir-6

As the work gets overloaded I need more admins to support they won’t be able to relate with the naming convention as I did.
To escape this situation the creator of Unix decided to follow a philosophy “Convention over Configuration”.
 As the name suggest giving priority to defined convention over individual’s configuration. So that everyone should be on the same page and keeping that in mind everyone else will follow.
And the simulation of the philosophy was like this

  • Binary files –> /bin
  • Configuration files –> /etc
  • Log files –> /log
  • Data files –> /var
  • Meta files –> /tmp
  • Libraries –> /lib

Which resulted in the Unix File Tree

$ tree -d -L 1
.
├── apps
├── bin
├── boot
├── dev
├── etc
├── home
├── lib
├── lib64
├── lost+found
├── media
├── mnt
├── opt
├── proc
├── root
├── run
├── sbin
├── snap
├── srv
├── sys
├── tmp
├── usr
└── var

22 directories

You might be thinking that how will Unix figure out where is the configuration file, where is the binary and rest of the stuff of the software.
Here comes the role of the PATH variable


$ echo $PATH
/home/dennis/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
These are environment variables specifying a set of directories where executable programs are located. In general, each executing process or user session has its own PATH setting.

So now we have a proper understanding of why do we need a File Tree.
For diving deep into the significance of each one of the directory stay tuned for Unix File Tree Part-2.

Cheers!

Git-Submodule

Rocket Science has always fascinated me, but one thing which totally blows my mind is the concept of modules aka. modular rockets. The literal definition of modules statesA modular rocket is a type of multistage rocket which features components that can be interchanged for specific mission requirements.” In simple terms, you can say that the Super Rocket depends upon those Submodules to get the things done.
Similarly is the case in the Software world, where super projects have multiple dependencies on other objects. And if we talk about managing projects Git can’t be ignored, Moreover Git has a concept of Submodules which is slightly inspired by the amazing rocket science of modules.

Hour of Need

Being a DevOps Specialist we need to do provisioning of the Infrastructure of our clients which is sometimes common for most of the clients. We decided to Automate it, which a DevOps is habitual of. Hence, Opstree Solutions initiated an Internal project named OSM. In which we create Ansible Roles of different opensource software with the contribution of each member of our organization. So that those roles can be used in the provisioning of the client’s infrastructure.
This makes the client projects dependent on our OSM. Which creates a problem statement to manage all dependencies which might get updated over the period. And to do that there is a lot of copy paste, deleting the repository and cloning them again to get the updated version, which is itself a hair-pulling task and obviously not the best practice.
Here comes the git-submodule as a modular rocket to take our Super Rocket to its destination.

Let’s Liftoff with Git-Submodules

A submodule is a repository embedded inside another repository. The submodule has its own history; the repository it is embedded in is called a superproject.

In simple terms, a submodule is a git repository inside a Superproject’s git repository, which has its own .git folder which contains all the information that is necessary for your project in version control and all the information about commits, remote repository address etc. It is like an attached repository inside your main repository, which can be used to reuse a code inside it as a “module“.
Let’s get a practical use case of submodules.
We have a client let’s call it “Armstrong” who needs few of our ansible roles of OSM for their provisioning of Infrastructure. Let’s have a look at their git repository below.

$    cd provisioner
$    ls -a
     .  ..  ansible  .git  inventory  jenkins  playbooks  README.md  roles
$    cd roles
$    ls -a
     apache  java   nginx  redis  tomcat
We can see in this Armstrong’s provisioner repository(a git repository) depends upon five roles which are available in OSM’s repository to help Armstrong to provision their infrastructure. So we’ll add submodules osm_java and others.

$    cd java
$    git submodule add -b armstrong git@gitlab.com:oosm/osm_java.git osm_java
     Cloning into './provisioner/roles/java/osm_java'...
     remote: Enumerating objects: 23, done.
     remote: Counting objects: 100% (23/23), done.
     remote: Compressing objects: 100% (17/17), done.
     remote: Total 23 (delta 3), reused 0 (delta 0)
     Receiving objects: 100% (23/23), done.
     Resolving deltas: 100% (3/3), done.

With the above command, we are adding a submodule named osm_java whose URL is git@gitlab.com:oosm/osm_java.git and branch is armstrong. The name of the branch is coined armstrong because to keep the configuration of each of our client’s requirement isolated, we created individual branches of OSM’s repositories on the basis of client name.
Now if take a look at our superproject provisioner we can see a file named .gitmodules which has the information regarding the submodules.

$    cd provisioner
$    ls -a
     .  ..  ansible  .git  .gitmodules  inventory  jenkins  playbooks  README.md  roles
$    cat .gitmodules
     [submodule "roles/java/osm_java"]
     path = roles/java/osm_java
     url = git@gitlab.com:oosm/osm_java.git
     branch = armstrong

Here you can clearly see that a submodule osm_java has been attached to the superproject provisioner.

What if there was no submodule?

If that was a case, then we need to clone the repository from osm and paste it to the provisioner then add & commit it to the provisioner phew….. that would also have worked.
But what if there is some update has been made in the osm_java which have to be used in provisioner, we can not easily sync with the OSM. We would need to delete osm_java, again clone, copy, and paste in the provisioner which sounds clumsy and not a best way to automate the process.
Being a osm_java as a submodule we can easily update that this dependency without messing up the things.

$    git submodule status
     -d3bf24ff3335d8095e1f6a82b0a0a78a5baa5fda roles/java/osm_java
$    git submodule update --remote
     remote: Enumerating objects: 3, done.
     remote: Counting objects: 100% (3/3), done.
     remote: Total 2 (delta 0), reused 2 (delta 0), pack-reused 0
     Unpacking objects: 100% (2/2), done.
     From git@gitlab.com:oosm/osm_java.git     0564d78..04ca88b  armstrong     -> origin/armstrong
     Submodule path 'roles/java/osm_java': checked out '04ca88b1561237854f3eb361260c07824c453086'

By using the above update command we have successfully updated the submodule which actually pulled the changes from OSM’s origin armstrong branch.

What have we learned? 

In this blog post, we learned to make use of git-submodules to keep our dependent repositories updated with our super project, and not getting our hands dirty with gullible copy and paste.
Kick-off those practices which might ruin the fun, sit back and enjoy the automation.

Referred links:
Image: google.com
Documentation: https://git-scm.com/docs/gitsubmodules


Gitlab-CI with Nexus

Recently I was asked to set up a CI- Pipeline for a Spring based application.
I said “piece of cake”, as I have already worked on jenkins pipeline, and knew about maven so that won’t be a problem. But there was a hitch, “pipeline of Gitlab CI“. I said “no problem, I’ll learn about it” with a Ninja Spirit.
So for starters what is gitlab-ci pipeline. For those who have already work on Jenkins and maven, they know about the CI workflow of  Building a code , testing the code, packaging, and deploy it using maven. You can add other goals too, depending upon the requirement.
The CI process in GitLab CI is defined within a file in the code repository itself using a YAML configuration syntax.
The work is then dispatched to machines called runners, which are easy to set up and can be provisioned on many different operating systems. When configuring runners, you can choose between different executors like Docker, shell, VirtualBox, or Kubernetes to determine how the tasks are carried out.

What we are going to do?
We will be establishing a CI/CD pipeline using gitlab-ci and deploying artifacts to NEXUS Repository.

Resources Used:

  1. Gitlab server, I’m using gitlab to host my code.   
  2. Runner server, It could be vagrant or an ec2 instance. 
  3. Nexus Server, It could be vagrant or an ec2 instance.

     Before going further, let’s get aware of few terminologies. 

  • Artifacts: Objects created by a build process, Usually project jars, library jar. These can include use cases, class diagrams, requirements, and design documents.
  • Maven Repository(NEXUS): A repository is a directory where all the project jars, library jar, plugins or any other project specific artifacts are stored and can be used by Maven easily, here we are going to use NEXUS as a central Repository.
  • CI: A software development practice in which you build and test software every time a developer pushes code to the application, and it happens several times a day.
  • Gitlab-runner: GitLab Runner is the open source project that is used to run your jobs and send the results back to GitLab. It is used in conjunction with GitLab CI, the open-source continuous integration service included with GitLab that coordinates the jobs.
  • .gitlab-ci.yml: The YAML file defines a set of jobs with constraints stating when they should be run. You can specify an unlimited number of jobs which are defined as top-level elements with an arbitrary name and always have to contain at least the script clause. Whenever you push a commit, a pipeline will be triggered with respect to that commit.

Strategy to Setup Pipeline

Step-1:  Setting up GitLab Repository. 

I’m using a Spring based code Spring3Hibernate, with a directory structure like below.
$    cd spring3hibernateapp
$    ls
     pom.xml pom.xml~ src

# Now lets start pushing this code to gitlab

$    git remote -v
     origin git@gitlab.com:/spring3hibernateapp.git (fetch)
     origin git@gitlab.com:/spring3hibernateapp.git (push)
# Adding the code to the working directory

$    git add -A
# Committing the code

$    git commit -m "[Master][Add] Adding the code "
# Pushing it to gitlab

$    git push origin master

Step-2:  Install GitLab Runner manually on GNU/Linux

# Simply download one of the binaries for your system:

$    sudo wget -O /usr/local/bin/gitlab-runner https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-386

# Give it permissions to execute:

$    sudo chmod +x /usr/local/bin/gitlab-runner 

# Optionally, if you want to use Docker, install Docker with:

$    curl -sSL https://get.docker.com/ | sh 

# Create a GitLab CI user:

$    sudo useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash 

# Install and run as service:

$    sudo gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
$    sudo gitlab-runner start

Step-3: Registering a Runner

To get the runner configuration you need to move to gitlab > spring3hibernateapp > CI/CD setting > Runners
And get the registration token for runners.

# Run the following command:

$     sudo gitlab-runner register
       Runtime platform                                    arch=amd64 os=linux pid=1742 revision=3afdaba6 version=11.5.0
       Running in system-mode.                             

# Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com/):

https://gitlab.com/

# Please enter the gitlab-ci token for this runner:

****8kmMfx_RMr****

# Please enter the gitlab-ci description for this runner:

[gitlab-runner]: spring3hibernate

# Please enter the gitlab-ci tags for this runner (comma separated):

build
       Registering runner... succeeded                     runner=ZP3TrPCd

# Please enter the executor: docker, docker-ssh, shell, ssh, virtualbox, docker+machine, parallels, docker-ssh+machine, kubernetes:

docker

# Please enter the default Docker image (e.g. ruby:2.1):

maven       
       Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!

# You can also create systemd service in /etc/systemd/system/gitlab-runner.service.

[Unit]
Description=GitLab Runner
After=syslog.target network.target
ConditionFileIsExecutable=/usr/local/bin/gitlab-runner

[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/gitlab-runner "run" "--working-directory" "/home/gitlab-runner" "--config" "/etc/gitlab-runner/config.toml" "--service" "gitlab-runner" "--syslog" "--user" "gitlab-runner"
Restart=always
RestartSec=120

[Install]
WantedBy=multi-user.target

Step-4: Setting up Nexus Repository
You can setup a repository installing the open source version of Nexus you need to visit Nexus OSS and download the TGZ version or the ZIP version.
But to keep it simple, I used docker container for that.
# Install docker

$    curl -sSL https://get.docker.com/ | sh

# Launch a NEXUS container and bind the port

$    docker run -d -p 8081:8081 --name nexus sonatype/nexus:oss

You can access your nexus now on http://:8081/nexus.
And login as admin with password admin123.

Step-5: Configure the NEXUS deployment

Clone your code and enter the repository

$    cd spring3hibernateapp/

# Create a folder called .m2 in the root of your repository

$    mkdir .m2

# Create a file called settings.xml in the .m2 folder

$    touch .m2/settings.xml

# Copy the following content in settings.xml

<settings xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd"
    xmlns="http://maven.apache.org/SETTINGS/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  
    
      central
      ${env.NEXUS_REPO_USER}
      ${env.NEXUS_REPO_PASS}
    
    
      snapshots
      ${env.NEXUS_REPO_USER}
      ${env.NEXUS_REPO_PASS}
    
  

 Username and password will be replaced by the correct values using variables.
# Updating Repository path in pom.xml


     central
     Central
     ${env.NEXUS_REPO_URL}central/
   
 
          snapshots
          Snapshots
          ${env.NEXUS_REPO_URL}snapshots/
        

Step-6: Configure GitLab CI/CD for simple maven deployment.

GitLab CI/CD uses a file in the root of the repo, named, .gitlab-ci.yml, to read the definitions for jobs that will be executed by the configured GitLab Runners.
First of all, remember to set up variables for your deployment. Navigate to your project’s Settings > CI/CD > Variables page and add the following ones (replace them with your current values, of course):

Now it’s time to define jobs in .gitlab-ci.yml and push it to the repo:

image: maven

variables:
  MAVEN_CLI_OPTS: "-s .m2/settings.xml --batch-mode"
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"

cache:
  paths:
    - .m2/repository/
    - target/

stages:
  - build
  - test
  - package 
  - deploy

codebuild:
  tags:
    - build      
  stage: build
  script: 
    - mvn compile

codetest:
  tags:
    - build
  stage: test
  script:
    - mvn $MAVEN_CLI_OPTS test
    - echo "The code has been tested"

Codepackage:
  tags:
    - build
  stage: package
  script:
    - mvn $MAVEN_CLI_OPTS package -Dmaven.test.skip=true
    - echo "Packaging the code"
  artifacts:
    paths:
      - target/*.war
  only:
    - master  

Codedeploy:
  tags:
    - build
  stage: deploy
  script:
    - mvn $MAVEN_CLI_OPTS deploy -Dmaven.test.skip=true
    - echo "installing the package in local repository"
  only:
    - master

Now add the changes, commit them and push them to the remote repository on gitlab. A pipeline will be triggered with respect to your commit. And if everything goes well our mission will be accomplished.
Note: You might get some issues with maven plugins, which will need to managed in pom.xml, depending upon the environment.
In this blog, we covered the basic steps to use a Nexus Maven repository to automatically publish and consume artifacts.