The Linux terminal (typically implemented with the Bourne Again Shell Script, BASH) can be intimidating for beginners and annoying for experienced programmers. But, it is a necessity for most users of Linux; and, in many situations, it can also be a huge productivity enhancer over graphical interfaces.
Fish is a commandline tool that is a great complement to BASH. It is highly configureable and provides nice features such as syntax coloring and TAB
command completion, out-of-the-box. For this tutorial, we will use a quick docker set-up with the smallest, but still capable, Linux image (bitnami/minideb). Then, we will install fish and try some of the most common Linux commands. Fish isn’t necessary for most of these bash commands; however, it can make your life a little easier while flying through various directories working on your next project.
Two great references that are much more in-depth than this tutorial are the following:
Preparation
Some terminology to know before working includes:
- terminal - text input/output environment
- console - physical terminal (older hardware-oriented term)
- command line - where user enters commands
- shell - command line interpreter
The user (you) will be opening terminal interface and entering the command line through the fish application shell.
Install docker for your operating system, then run Bitnami’s minideb (mini-debian) so that you have a disposable Linux container. You can break whatever you want in this container and not worry that you’ve hurt your actual operating system. To reset it, just create a new container.
#list current images
docker images
#obtain and run minideb container
docker run --rm -it bitnami/minideb:latest
#enter container's bash interface application
docker exec -it cntr_minideb bash
#install and use fish
apt-get update && apt-get upgrade
apt-get install fish
fish
Before you get started on the basics, you might want to install two additioanl applications. The nano
text editor can be much more intuitive than the more-common vim
. Also, the Cheat application can help in your learning. Digital Ocean has a tutorial for getting started.
apt-get nano
apt-get install python-pip
pip install cheat
cheat -v
Later, to exit the applications and shut-down docker
#from fish
exit
#from bash
exit
#continue with docker container, later
docker stop cntr_minideb
#kill container
docker rm cntr_minideb
Basic File System
These basic commands are common to many different shell systems. They include commands for working with files and directories, as well as some basic programming concepts, such as variables, that show-up throughout working in the terminal.
Quickstart
#directory structure
echo $HOME
mkdir <dir>
cd <dir>
pwd
tree
#display file statistics
stat <file>
file <file>
#display help
man <command>
help <command>
type <command>
whatis <command>
Variables
One way that the shell keeps track of all of these settings and details is through an area it maintains called the environment.
Environmental variables are variables that are defined for the current shell and are inherited, and override variables, by any child shells or processes. Environmental variables are used to pass information into processes that are spawned from the shell.
Shell variables are variables that are contained exclusively within the shell in which they were set or defined. They are often used to keep track of ephemeral data, like the current working directory.
By convention, these types of variables are usually defined using all capital letters.
Environment variables
#environment variables
printenv
#print specific variable
printenv <var>
#set specific variable
env VAR1="blahblah" command_to_run command_options
Shell variables
#shell variables
set | less
#without functions
env VAR1="blahblah" command_to_run command_options
Typical environment variables:
#describes the shell
SHELL
#specifies the type of terminal to emulate when running the shell
TERM
#current logged in user
USER
#current working directory.
PWD
#previous working directory
OLDPWD
#defines color codes used to add colored output to the ls command
LS_COLORS
#path to the current user's mailbox
MAIL
#list of (ordered) directories that the system will check when looking for commands
PATH
#current language and localization settings
LANG
#current user's home directory
HOME
#previously executed command
_
Common shell variables:
#list of options that were used when bash was executed
BASHOPTS
#verion
BASH_VERSION
BASH_VERSINFO
#number of columns wide that are being used to draw output
COLUMNS
#stack of directories that are available with the pushd and popd commands
DIRSTACK
#number of lines of command history stored to a file
HISTFILESIZE
#number of lines of command history allowed in memory
HISTSIZE
#The hostname of the computer at this time
HOSTNAME
#internal field separator to separate input on the command line; space by default
IFS
#primary command prompt definition
PS1
#secondary prompts for when a command spans multiple lines
PS2
#shell options that can be set with the set option.
SHELLOPTS
#the UID of the current user
UID
Test shell variables:
#define shell var
TEST_VAR='Hello World!'
#confirm
set | grep TEST_VAR
printenv | grep TEST_VAR
#reference
echo $TEST_VAR
#shell vars are not passed to child processes
bash
echo $TEST_VAR
Test environment variables:
#export shell var to the environment
export TEST_VAR
#available in child process
bash
echo $TEST_VAR
exit
#convert back to shell var
export -n TEST_VAR
#remove completely
unset TEST_VAR
Typical usage:
nano ~/.bash_profile
#add `export TEST_VAR='Hello World!'
#exit
source ~/.bash_profile #instead of restarting the terminal
Creation, redirection, and wildcards
These commands are a little more advanced, but you see them often in instructions. Once you get comfortable using them, they become very helpful.
#new file
touch <filename>
#overwrites
<stream> > <filename>
#appends
<stream> >> <filename>
#pipe output of left to input on right
<stream> | <stream>
A common usage of the pipe:
printenv | less
Additional operators:
#takes input from the file on the right instead of the keyboard
<
#redirect standard error to the location on the right
^
#match any character string that does not include "/"
*
#matche any single character, not including "/"
?
#match any string including "/", recusively
**
Examples of the above commands:
#list files in that directory and child directories
ls /etc/**.conf
#stand output to file, stand error to where stand output goes (same file)
ls /etc >ls_results.txt ^&1
Command history and TAB completion
The history of your previous commands is key for moving quickly and remembering completed tasks. You can move up chronologically in your history by using the UP
key. You can move in the reverse direction by using the DOWN
key. This is fairly standard. If we wish to return to our prompt, we just hit the escape
key. We can also type in part of a previous command and then press the UP
key to search for the latest instances of that specific command. Furthermore, we can use the ALT-UP
and ALT-DOWN
commands to recall the command line arguments only.
#previous commands
history
#last argument
<cmd> !$
<cmd> !_1
#last command
!!
File manipulation
Download files or get them via http request, then familiarize yourself with them.
#download
wget
#http requests
curl
#concatenate
cat
#word count
wc
#print file
head
tail
System Administration
Administration of a Linux server is a huge field. External references are provided, in various sections, to give more detailed explanation.
Machine configuration
These are basic commands to understand your machine and prepare it for use.
#os version
cat /etc/os-release
#kernel version
uname -r
#number of core
python -c 'import multiprocessing as mp; print(mp.cpu_count())'
#make current
sudo apt-get update && sudo apt-get upgrade
Displaying disk usage
To understand the foundations around these commands, take a look at the following:
du --max-depth=1 --human-readable /home/vagrant/ | sort --human-numeric-sort
du -d1 -h /home/ubuntu | sort -h
du -sh
df -h
df -i #inodes
Describing CPU and memory usage
This is another field where you really need to know the internals to understand what you’re looking at. Get a better understanding of cpu utilization, here.
sudo apt-get install linux-tools-common linux-tools-generic linux-tools-3.13.0.110
perf stat -a -- sleep 10
Notes
- IPC is < 1.0: you are likely memory stalled, and software tuning strategies include reducing memory I/O, and improving CPU caching and memory locality
- IPC is > 1.0: you are likely instruction bound. Look for ways to reduce code execution: eliminate unnecessary work, cache operations, etc. CPU flame graphs are a great tool for this investigation
Installing applications
This is a very old field and you will likely encounter many types of applications of varying history and quality. Getting to know how to install applications and build packages can give you a greater appreciation before you inevitably do it yourself.
#pre-approved sources to get packages from
nano /etc/apt/sources.list
#install package
dpkg -i <some_deb_package.deb>
#remove package
dpkg -r <some_deb_package.deb>
#list packages
dpkg -l
#package manager: advanced package tool
apt install <package>
apt list
apt moo
Building packages
This may include many other commands based upon the language of the application.
#build tools
apt install build-essential
tar -xzvf <package.tar.gz>
#check dependencies
./configure
#install:copy the correct files to the correct locations on your computer.
make install
#uninstall
make uninstall
#instead make .deb and install it
checkinstall
Processes
The kernel is the software between applications and hardware. It is managing the processes and resources provided to them. The Linux kernel is named for Linus Torvalds who first created a replacement for Bell’s proprietary Unix kernel. He also created git
version control system, and, so, his discussions and comments on web boaurds come with a healthy dose of ego.
#example process
sleep 1
#run in background
sleep 1 &
jobs
#all details and non-tty ps
ps aux
#refresh 10sec and NIceness priority
top; htop;
renice 10 -p <pid>
pstree -p
pgrep <process>
pkill <process>
#process filesystem
ls /proc
#process state
cat /proc/<pid>/status
#find from what directory process is run
pwdx <pid>
ps -ef
#what processes are listening
netstat -lntp
#how long system is running
uptime
Routing
Routing is a HUGE subject, but these commands can give you a good support for the first 20%.
apt install net-tools
route -n
ifconfig
apt install iproute2
#all interfaces
ip link show
#stats of an interface
ip -s link show eth0
#show ip addresses allocated to interfaces
ip address show
#network manager
apt install nm-tool???
Background shells
Multiple screens will become much more useful has you become more capable from the commandline. There are three main systems, take a look at these references for getting a comparison of them.
nohup
screen
tmux
Profiles
Once you have commands down, you will want to start using the terminal on your host machine and customizing it to get your work down more quickly. There are two main files for customizing your system.
~/.bashrc
- Save aliases, shell settings, and functions you commonly use in ~/.bashrc, and arrange for login shells to source it. This will make your setup available in all your shell sessions.~/.bash_profile
- Put the settings of environment variables as well as commands that should be executed when you login in ~/.bash_profile. Separate configuration will be needed for shells you launch from graphical environment logins and cron jobs.
A login shell is a shell session that begins by authenticating the user. If you are signing into a terminal session or through SSH and authenticate, your shell session will be set as a “login” shell. A session started as a login session will read configuration details from the /etc/profile
file first. It will then look for the first login shell configuration file in the user’s home directory to get user-specific configuration details. It reads the first file that it can find out of ~/.bash_profile
, ~/.bash_login
, and ~/.profile
and does not read any further files.
A non-login shell is created if you start a new shell session from within your authenticated session, like we did by calling the bash command from the terminal. You were not asked for your authentication details when you started your child shell. A non-login shell will read /etc/bash.bashrc
and then the user-specific ~/.bashrc
file to build its environment.
An interactive shell session, such as one that begins with ssh, is a shell session that is attached to a terminal. A non-interactive shell session is one is not attached to a terminal session, such as a script run from the command line. Non-interactive shells read the environmental variable called BASH_ENV
and read the file specified to define the new environment.
We will usually be setting user-specific environmental variables, and we usually will want our settings to be available in both login and non-login shells. This means that the place to define these variables is in the ~/.bashrc
file.
If you need to set system-wide variables, you may want to think about adding them to /etc/profile
, /etc/bash.bashrc
, or /etc/environment
.
Alias and functions
This is another way to customize your system. For more detailed explanation, try this tutorial.
alias ls="ls --group-directories-first --color"
cdls() { cd "$@" && ls; }
cddu() { cd "$@" && du -d1 -h . | sort -h; }
du1() { du -d1 -h "$@" | sort -h; }
du2() { du -d2 -h "$@" | sort -h; }
Path
Your Path is where the shell looks for commands. It is an ordered list, so the first application that fits the command is used.
export PATH=$PATH:/path/to/dir
User Management
Access and groups
Access can take a while to become comfortable with understanding. Stay with it as security and protection is a necessity for every usage.
#list of groups
cat /etc/group
#list all users in a Linux group
getent group <groupname>;
#list all groups you belong to
groups
#create new user
sudo adduser <name>
#remove user
deluser <name>
#add user to group
sudo usermod -G <name-of-group> -a <name-of-user>
#remove user from group
gpasswd -d <user group>
SSH to remote server
login to remote
ssh-keygen -t rsa
ssh-copy-id githubstats@144.76.39.53
ssh githubstats@144.76.39.53
VPS system admin
ref(user-config): https://www.digitalocean.com/community/tutorials/how-to-use-git-to-manage-your-user-configuration-files-on-a-linux-vps
Advanced File System
Copying, moving, and merging files
#copy without subfolders
find worker-1/data/db/* -type f -exec cp {} data_all_Random-Net/ \;
python integrate_databases.py --src ~/data_ALL/* --dest ~/COMBINED.db
# Data files: moving
#merge
rsync -aP worker-1/data/db/* data_all/
rsync -aP worker-2/data/db/* data_all/
Copy files to remote server
scp file.txt <remote_username>@<ip_address>:/remote/directory
scp file.txt jason@10.10.0.2:/remote/directory
scp <remote_username>@<host_name>:/from/directory /to/directory/
scp jason@hetzner.com:/mnt/jason/file.xml /Users/jason/Desktop/
Find files, directories, commands, and applications
This another topic that can take a while to learn all of the functionality, but can make your life amazinginly easier in the long run. Refer back to over time, and reference a few of these tutorials at Digital Ocean and Binary Tides.
#files, dir
find .
find <location> <comparison-criteria> "<search-term>"
find <path> -type <f,d> -maxdepth <2> -iname "<name_ignore_case>" ! -iname "<or_name>"
find / -ipath "<path_ignore_case>"
find / -lname "<symbolic_link>"
find / -regex "<name_by_regex>"
#find files and remove them
find . -maxdepth 1 -type d -name "feature-*" -exec rm -rf {} \;
#check everywhere
locate <filename>
#commands
history | grep ssh
#application binaries
whereis -b <application>
Find files that contain matching text
Searching among millions of files for specific text is a very powerful feature. Be sure to include this in your daily habits and it will be a live-saver for some future endeavor.
grep -rnw '/path/to/somewhere/' -e 'pattern'
# -r or -R is recursive
# -n line number
# -w match the whole word
# -i ignore case
# -l (lower-case L) can be added to just give the file name of matching files
Along with these, --exclude
, --include
, --exclude-dir
flags could be used for efficient searching:
-
This will only search through those files which have .c or .h extensions:
grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
-
This will exclude searching all the files ending with .o extension:
grep --exclude=*.o -rnw '/path/to/somewhere/' -e "pattern"
-
For directories it’s possible to exclude a particular directory(ies) through –exclude-dir parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:
grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/somewhere/' -e "pattern"
Perform command on multiple files
There are two ways you can apply the same command across multiple files. The first is a one-liner using find
with -exec
. The second is using a for
loop. The examples use the following Clojure compilation on a javascript file: java -jar compiler.jar --js filename.js --js_output_file newfilename.js
#find and execute
find . -name "*.js" -exec java -jar compiler.jar --js {} --js_output_file new{} \;
#typical for loop
for filename in ./*.js
do
java -jar compiler.jar --js "${filename}" --js_output_file "new${filename}"
done
Compress and archive
#create zip (automatic compression)
zip -r <desired_file.zip> <dir_path>
#exclude file from zip
zip <desired_file.zip> -r <dir_path> -x *.<file_extension>
#add to existing zip
zip -r <existing_file.zip> <path/to/dir>
#tar compress, maintains folder structure
tar -czvf <desired_tar_file> <dir_path>
#list files
tar -tf <tarfilename>
#uncompress, extract
tar -xzvf <tarball_file>
Delete and trash (safe-delete)
rm <file>
sudo apt-get install trash-cli
trash <file>
trash-list
trash-restore <file>
trash-empty
Processing Files
This subject is quite popular after the big data and data science movement. Being able to work with files as they exist on the file system, without bringing them into a memory-hungry processing environment, such as R or Python, can be a life-saver.
These are a few popular resources for learning more:
Count files in dir
echo 'Files: '; ls -1q ./* | wc -l
echo 'Files in subs'; find . -type f | wc -l
Data files: summarizing
#check number of lines
! wc -l Data/taxi/green_tripdata_2015-09.csv
#check first few lines
! head -n3 Data/taxi/green_tripdata_2015-09.csv
Regex
env | grep -i User
Tools
For YAML, use shyaml. For Amazon S3, s3cmd is convenient and s4cmd is faster. Amazon’s aws and the improved saws are essential for other AWS-related tasks.
- jq - sed for JSON; For interactive use, also see jid and jiq
- json2csv - convert JSON to CSV
- csvkit - suite of utilities for working with CSV; provides in2csv, csvcut, csvjoin, csvgrep, etc.
- scrape - HTML extraction using XPath or CSS selectors
- xml2json - convert XML to JSON
Version Control and Github
The awesome github page provides a good understanding of the functionality that made Github the universal repo site for public source code projects. Its private couterpart is an emulation named Gitlab.
# Git
git checkout HEAD -- my-file.txt
git push origin --delete <branch_name>
git branch -d <branch_name>
git remote set-url origin https://some_url/some_repo
git reset --hard
git clean -f -d
Conclusion
This post is more of a cheatsheet and summary than it is for learning. Each individual subject could be expounded upon in a book. The reader can use this to gain a superficial understanding and have some commands to try. After learning more about the details, it can be referenced for a quick review.