Montag, 15. August 2011

Xilinx ISE on Linux

The FPGA vendor Xilinx very early started to offer their development tools (ISE, WebPack) for Linux. Unfortunately there are still some problems, for which workarounds are documented here.

Wind/U X-toolkit Error: wuDisplay: Can't open display

This error message comes from the Windows-to-Unix toolkit they use for their GUI. It doesn't like the DISPLAY=:0.0 value. The workaround is to set it to ":0"
export DISPLAY=:0
The problem was seen with ISE 10.1 and 12.3.

Kudos for pointing out the solution go to circuitben.

XILINX JTAG tools on Linux without proprietary kernel modules


For more detailed information please visit http://www.rmdir.de/~michael/xilinx/.

On Debian you need fxload

aptitude install fxload
and then setup the necessary files (as root!)
/opt/Xilinx/ISE/10.1/ISE/bin/lin/setup_pcusb
This installs some files to /usr/share (yes, directly, but Impact really requires them there :-( ). Ensure that the .hex files in /usr/share are world-readable.

Before you start the ISE, set the environment variable
export XIL_IMPACT_USE_LIBUSB=1

The above instructions are required for ISE 10.1, 12.3 and 12.4.

The JTAG tool has the following USB IDs
unconfigured03fd:000f
configured03fd:0008
(see also: /etc/udev/rules.d/xusbdfwu.rules).

FPGA Editor

The FPGA Editor of ISE 12.4 needs Motif 3 (libXm.so.3). Luckily libXm.so.4 can also be used, so simply set a symbol link.
cd /usr/bin
ln -s libXm.so.4 libXm.so.3
The tool also requires libstdc++-5.
aptitude install libstdc++-5

Simulation with ISIM

ISim converts the VHDL and Verilog files to C files and uses gcc to compile them.They ship GCC and all libraries, but these didn't work for me and others. The following trick helped by replacing the shipped version of libstdc++ by the one of the Linux system.
cd /opt/Xilinx/ISE/12.4/ISE_DS/ISE/lib/lin64
rm libstdc++.so libstdc++.so.6
ln -s /usr/lib/libstdc++.so.6 .
ln -s libstdc++.so.6 libstdc++.so

Yet Another Networking Introduction


Basics

Networking is built upon the famous OSI Layers. We only concentrate on the TCP/IP stack here, so we can simplify this a bit. The layers are as follows
Application
TCP, UDP, ...
IP, ICMP, ARP, ...
Ethernet, ...

We assume an ethernet link between our hosts. Every host has an Ethernet MAC and PHY chip. The PHY is connected to the network wire (e.g. twisted pair, coax, fiber), we don't care about it here. The MAC chip is responsible for the Medium Access Control (though its name). Whenever a packet arrives, it compares its destination MAC address to its own and if they match, the packet is stored. Otherwise it is ignored. When a packet should be sent, the MAC waits until the physical medium is free (i.e. no other device is transmitting data) and then transmits the packet.
Preamble Dest. MAC Src. MAC Type Data Checksum Postable

The preamble and the postamble are only important for the PHY layer (clock recovery, ...). The destination MAC is built of 48 bits and is written like this: 00:1a:53:2c:92:f4. It comes first that all receivers can determine whether they are addressed or they can ignore the packet. The 48 bit source MAC address follows and then the 16 bits of the protocol Type is in the packet. It specifies the wrapped protocol, i.e. wheter an IP, an IPv6, an ARP or any other packet is wrapped,
TypeProtocol
<0x05DC Length field for IEEE 802.3 field
0x0800 Internet Protocol (IP) ver. 4
0x0806 Address Resolution Protocol (ARP)
0x86dd Internet Protocol ver. 6 (IPv6)

Every IP packet is completely wrapped into one ethernet frame. It holds its own address space. IP addresses look like this: 192.168.1.2. Every packet holds a source and a destination address. The packet is sent through the lower levels (MAC, PHY).

IP can do more than ethernet: it introduces the concept of Routing. While for the ethernet protocol all hosts which communicate to each other must be connected to the very same physical segment (i.e. one long coax cable or to one hub), IP can overcome this limit. IP can even be transported via totally different media and transport protocols like via modem (phone line), ADSL, Frame Relay, ATM, ... The transport of a packet from one medium to another is accomplished by a router. This has to do a bit more processing than just PHY and MAC chips.

To explain routing, we have to look at the addresses. While MAC addresses on a single network segment can vary largely (due to different network interface card vendors), all IP addresses on a single network segment must be similar. To be more specific, they have to stem from a single subnet. Connecting subnets is the job of a routers.

A subnet is a range of consecutive IP addresses. An IP address is stored in a 32 bit variable. When this value is masked (AND relation) by a bitmask with a certain number of MSB ones, all addresses which result in the same masked value belong to the same subnet. Imagine the netmask 0xFFFFFF00 (usually written as 255.255.255.0), the addresses 192.168.1.0-192.168.1.255 are a subnet. 10.10.10.0-10.10.10.255 are a subnet too. A subnet is always characterized by its lowest address (the so called network address) and its netmask, e.g. 192.168.1.0/255.255.255.255, 192.168.1.0/24. The latter notation specifies the number of MSB ones.

At the beginning of TCP/IP, the netmask was implicitly specified by the network address. All addresses from 1.0.0.0-127.0.0.0 are so called class A nets with a netmask of 255.0.0.0. Addresses from 128.0.0.0-191.255.0.0 are class B nets with a netmask of 255.255.0.0. Addresses from 192.0.0.0 to 223.255.255.0 are class C nets with a netmask of 255.255.255.255. Addresses from 224.0.0.0 are special multicast addresses which are not used directly by hosts. Nowadays only the names of the three classes are used as abbreviations for netmasks, but the netmask is not specified implicitly by the network address any more. The newer standard with explicitly given netmasks is called Classless Internet Domain Registration (CIDR). It is even more flexible, because also "intermediate classes" can be built, e.g. 255.255.224.0.

Consider the subnet 192.168.0.0/24. It has the range 192.168.0.0 - 192.168.0.255. Note that no host may have the first or have the last address in the subnet. The first address, here 192.168.0.0, is called the network address. The last address, 192.168.0.255, is the broadcast address.

Back to routing. Whenever the destination address is within the same subnet as the hosts own address, it can be directly reached via the ethernet link by definition. For addresses outside of the subnet, the packet has to traverse a router. The IP protocol stack sends the packet to the router which sends the packet to the other network. A packet can therefore tour over the world through several routers.

Since there are two parallel address spaces (MAC and IP addresses), we need a translation facility for them. Whenever a host wants to send a packet to another host, it usually knows its IP address but doesn't know its MAC address. Therefore the Address Resolution Protocol (ARP) is used, to fill the gap. A broadcast packet is sent to every MAC address (so the MAC chip listens to its own address plus the broadcast address FF:FF:FF:FF:FF:FF) and the TCP/IP stack of the destination computer will send an answer to the requesting host. The requesting host then stores the new information (MAC<->IP address relation) into its ARP table, sometimes also called the Neighbor Table. Then the real packet is sent with the receiver's MAC address as destination.

This is necessary, because all hosts only hear to their own MAC address on the physical medium (and the broadcast address), so the transmitter must fill in the correct destination MAC address.

When the packet's destination is outside of the own subnet, the IP stack uses the MAC address of the router. The ethernet frame then holds the destination MAC of the router and the destination IP of the real destination. This is how the router can determine if the packet is for itself or it should be forwarded (i.e. routed) to another one of its network interfaces.

Important: IP packets are stateless and connection less. That means, that every single IP packet is transmitted on its own. There is no relation between several consecutive packets between two hosts. It is even possible that they take different routes (e.g. due to load changes) and therefore arrive in different order as they have been sent. IP packets also do not have ports and don't have error handling (e.g. retransmit).

OTOH, IP can send error messages. This is done using ICMP packets. They have different types and codes (two 8-bit values) which specify the error. There are even ICMP requests, like an echo request. This is answered by an ICMP echo reply packet. This is how the ping command is implemented.

Every IP packet holds the field Time To Live (TTL) which is decremented by every router. When it reaches 0, an ICMP error message ("time exceeded in transmit") is sent to the original sender. Say the original packet has TTL=64, then the first router decrements it to TTL=63, the next router to TLL=62, ... If the packets enters a routing loop (due to wrong router configuration), the TTL is decremented until it reaches 0. This avoids packets circulating infinitely.

The TTL field is used by traceroute for diagnostic purpose. It sends a packet to the destination but sets TTL=1. Then the first router will send an ICMP error message ("time exceeded in transmit"). The program will then display the source IP address of this ICMP packet, which is the first router. Then a packet with TTL=2 is sent. The second router on the route will issue the ICMP error packet. This is continued with increasing TTL values until the destination is received.

The IP protocol is almost never used directly. Higher level protocols like TCP, UDP or ICMP are wrapped in IP packets. They provide additional features over IP. We already talked about ICMP. It is also connectionless and stateless and only used as service facility.

UDP adds source and destination port number and a checksum above IP. It is also connectionless and stateless and a higher level protocol (like NFS, SMB, SNMP, DHCP, TFTP, ...) is responsible for error recovery, packet order, ...). Therefore it is called a datagram protocol. UDP is very simple and thus used by lots of embedded systems.

TCP is the most powerful protocol in this family. Similar to UDP it adds source and destination port number and a checksum above IP. Additionally it adds a sequence number, acknowledge packets, and some options. The TCP protocol implements packet reordering and error recovery (retransmission). Note that a TCP connection is stateful, that means that a connection is initialized and then held active until it is closed. The data stream is always bidirectional. Whenever a packet is sent from A to B, B replies with an ACK packet. So even for a unidirectional data stream, there are bidirectional packets on the way. The TCP protocol implementation is very complicated, because it uses timers, states, ... Due to its good service quality its use is wide spread by many higher level protocols like HTTP(S), POP3, IMAP, SMTP, SSH, FTP, Telnet, ...

Until now we always talked about IP addresses in IP packets (or ICMP/UDP/TCP packets wrapped inside IP packets) like 192.168.0.1. It is very inconvenient to remember those numbers. To circumvent this problem, the domain name service (DNS) was created. DNS itself uses the IP protocol, precisely the UDP protocol with port 53. Whenever you type a domain name like johann-glaser.blogspot.com into your browser or at the command line of e.g. ping, the program itself first has to translate this name into the according IP address.

This is done by the C function gethostbyname() which is implemented in the libc. It connects to the DNS server, sends a DNS query packet and waits for the DNS reply. Then the replied IP address is used by the program to connect() to the host.

Important: When you want to try if the network connection is working, a "ping johann-glaser.blogspot.com" doesn't tell you clearly whether the connection is ok. If it results in an error, the DNS lookup could have failed, while the connection is perfectly ok. So you should use "ping 74.125.39.132" instead, because this doesn't use a DNS lookup but directly checks the connection.

Another important protocol used in the background is the Dynamic Host Configuration Protocol (DHCP). It is used to setup the IP settings of a host. Before the host knows its own IP address, the DHCP client daemon sends a DHCP REQUEST in a UDP broadcast packet (255.255.255.255, port 68) to the network. This is captured by a DHCP server. The server replies with a UDP REPLY which holds the IP address, netmask, default gateway, DNS server IPs, lease time and possibly many other options like TFTP IP, TFTP file path, NTP server, ... Therefore a DHCP server can remotely setup a client (upon the clients request).

Interface Configuration: ifconfig

The network interface card (NIC) is called an "interface". Usually it is named eth0. The second card is called eth1, and so on. To configure the IP parameters of the NIC, the program ifconfig is used. To display the current configuration, simply use
ifconfig eth0
To setup the parameters, use
ifconfig eth0 address [ip-addr] netmask [netmask]

The network card initially is down (i.e. switched off). To switch it on, use
ifconfig eth0 up
to switch it off again, use
ifconfig eth0 down

Routing Table: route

Every host itself is a little router, except that it usually has only one NIC and therefore doesn't forward packets. Every host holds its own routing table. This tells the IP stack where to send IP packets. To display the routing table, use
route -n
The -n option tells route not to do a reverse name lookup for the ip addresses when displaying them. This speeds up the program. Usually you have two entries:
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
0.0.0.0         192.168.0.1     0.0.0.0         UG    0      0        0 eth0
The first line here shows that addresses with destination 192.168.0.0/255.255.255.0 are sent directly (without a gateway, 0.0.0.0) via the interface eth0. This is the local subnet. The second line is the default route. It tells the routing algorithm that any packet which is not matched by any previous routing table entry must be sent via the router (i.e. default gateway)  192.168.0.1.

When you use ipconfig to setup the IP address as described above, one routing table entry is automatically generated. The default route must be added manually with
route add default gw [router]

Flexible IP Stack: ip

The tools ifconfig, route and some more are just old relicts and now used as frontends for the much more flexible tool ip. Please see its man page for the documentation.

ARP Table: arp

The current ARP table can be displayed with the command
arp -n
Again, the parameter -n avoid reverse name lookups. The output looks like
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.0.26             ether   00:0E:A6:C1:6C:B3   C                     eth0
192.168.0.38             ether   00:50:BA:1C:11:02   C                     eth0
192.168.0.204            ether   00:12:3F:E3:3E:33   C                     eth0
The table shows the IP address and its according MAC address as well as the interface where the IP/MAC have been found.

DNS Server

The DNS server is configured in the file /etc/resolv.conf. There are two types of entries: search and ''nameserver'. At the ICT we have the following content:
search example.com
nameserver 192.168.0.204
nameserver 192.168.0.203
The first line specifies the "default domain". You can shortcut the domain name "www.example.com" with "www". The other lines specify several name servers. When the first one doesn't reply, the second one is used as a fallback, and so on. The addresses must be specified as IP addresses (rather than domain names) because they can not be resolved before a DNS is known by its IP address.

iptables

The Linux kernel starting from version 2.4 uses the firewalling infrastructure Netfilter. This is completely built into the kernel. To setup the firewall rules, the userspace program iptables is used. Please refer to its man page for documentation.

Diagnostics: ping, netstat, traceroute

Echo Request: ping

The tool ping was already described above. It sends an ICMP echo request to the specified IP address (after a name lookup if you supplied the domain name). The ICMP echo replies are displayed. Important is, that before the IP packet (wrapping the ICMP packet) can be sent, an ARP request is issued. If the IP address does not exist on the network, then no ARP reply is received (after three retires). Then your host creates an ICMP error packet for itself and ping will display an "Destination Host Unreachable" error message. If you ping an IP outside of your subnet, no such ICMP error messages are generated or received.

Network Connections: netstat

When a program listens on a UDP or TCP port or a program has an open connection to another host, they can be displayed with the tool netstat. Use
netstat -anptu
to show all open and listening UDP and TCP connections. Note that only root can see the process IDs and names for programs not owned by its own.

Route to Destination: traceroute

traceroute also was already described above. It is used to see if the route to a host is ok.

Remote Access

Remote Login: telnet

The simplest tool to connect to another host via the TCP protocol is telnet. Use it with
telnet host [port]
where the port is optional and defaults to 23. Everything you type into the program is sent as a TCP packet to the remote host. Everything received is printed to the console. You can quit with ^[ (on a german keyboard press Ctrl+AltGr+8) and the type "quit".



Important: Don't use this program for remote logins because everything is transmitted unencrypted, even your password. Only use it for diagnostic purposes, e.g. to simulate an SMTP session by hand.

Network Transport: netcat, nc

Similar to telnet, netcat (usually called nc) connects via TCP and transmits data. It is somewhat more automated and direct than telnet. It connects stdin and stdout between two hosts. Use
nc -l -p 12345
to listen ("-l") on port 12345 on one host and
nc hostname port
on the other host. Everything redirected to stdin of on nc is written to stdout of the other nc.

File Transfer: ftp

The FTP protocol is used to transfer files from a FTP server to the FTP client or the opposite direction. The command line program ftp is a frontend for the protocol. You can download and upload files from/to the server.

Tiny File Transfer: tftp

For small embedded systems, the FTP protocol (using TCP) is too complicated, so the TFTP protocol was created. It uses UDP and does not have anything in common with FTP except its purpose and part of its name.

Download: wget

wget is a pure download program. Use it like
wget http://www.example.com/file.ext
or
wget ftp://ftp.example.com/file.ext

Secure Shell: ssh

Formerly telnet was used to login remotely to other hosts. Unfortunately it doesn't encrypt any data, so even the passwords can be sniffed easily by intruders. SSH, the Secure SHell was constructed to circumvent these problems. All data is encrypted. This makes it more complicated, so on embedded systems sometimes only telnet is available. Use it with
ssh user@host

Secure Copy: scp

scp is an addon to SSH which uses this protocol to copy files. Use it with one of the following command lines
scp localfile.ext user@host:remote/path/to/destination/
scp user@host:remote/path/to/file.ext .
scp userA@hostA:remoteA/pathA/toA/fileA.ext userB@hostB:remoteB/pathB/toB/destinationB/

Secure File Transfer: sftp

SSH also adds the secure file transfer protocol, which is used with the sftp program. Its use is similar to ftp.

Sonntag, 14. August 2011

Yet Another Linux Introduction

Basics

Every directory has two special subdirectories called "." and "..". "." always points to the same directory itself, so "/path/to/./././" is e qual to "/path/to/". ".." points to the parent directory, so "/path/to/../" equals "/path/".

Files and directories with names starting with a periode "." are hidden by convention. Usually they are not listed by ls and file dialogs. Note that this is not a file system function but only a convention. You can see all hidden files with the "-l" parameter of ls
ls -la
Give it a try in your home directory, you will be surpriesed!

Every program has three file descriptors when it is started. These are
NumberNameDescription
0 stdin input to the program
1 stdout output of the program
2 stderr error messages of the program

That means, that every (console) program writes it output to stdout and gets its input from stdin. Error messages are written to stderr. These file descriptors can be redirected, see below.

An important feature of Linux is, that everything is a file. This means, that every device, every pipe, ... is a file. The hard disk is usually the file /dev/hda, its first partition is /dev/hda1, and so on. The directory /dev/ holds lots of "special" files, called device nodes. The device "/dev/null" is the Nirvana. Everything written to /dev/null is ignored (and thus lost). See here how to redirect error messages to /dev/null.

Programs usually get command line parameters when executed. There are option parameters. These have a long form, e.g. "--recursive", "--invert-match" and a short form "-r", "-v". Short forms can always be combined, e.g. "-rv" (for "grep"). Some options need an additional parameter, e.g. "--file=FILE" or "-f FILE". In this case the "-f" must be the last one in a combination immediately followed by the parameter, e.g. "-rvf FILE". Long forms can not be combined.

If a command expects a file name as its parameter, it is always the last one. This is necessary, because most programs can accept one or more file names, e.g. "grep -v pattern FILE1 FILE2". How this is used by the shell is documented here.

Getting Help

The Program Explains itself: --help

Nearly every program can be executed with the command line parameter --help.

The Manual: man

The most power full features of Linux are the manual pages. Every (!) program is documented in quite extensive manual pages. You can access them with
man program
where program is the name of the program, e.g. grep, head or even man itself. Go on, give it a try! Manual pages are usually viewed with less (see below). To quit type the [Q] key.

Man pages are organized in 8 chapters. Sometimes there are man pages with the same name in more than 1 chapter, then use e.g.
man 3 printf
for the LibC function printf().

To search a tool for a certain task, e.g. to download a file from the internet, use
apropos download
This searches in all one-line-descriptions of all manpages.

The Info Pages: info

Many programs additionally have info pages. This documentation is usually more verbose than man pages and organized as hyper text (with links, ...). Usage:
info program
where program is e.g. nano, gdb or even info itself.

Terminal

Unix terminals (XTerm, Konsole, Eterm, RXVT, the text console, ...) are feature rich. E.g. they can display colors, move the cursor, ... The most often used feature is the backtrace of the output. When the screen is full and more lines come, the screen is scrolled. You can look what was before by pressing [Shift]-[PgUp] and [Shift]-[PgDn]. A certain amount of lines is kept in the buffer.

Pattern Search: grep

grep is used to search strings inside of files. The pattern to match is given as a regular expression. Grep searches line by line. It is used
grep pattern file1 file2 ...
Usually you want to put the pattern in quotes (e.g. "foo.*bar")
The option '-v' inverts the match, so all lines not containing the pattern are printed to stdout.

File Comparision: diff

To compare two text files, diff is used. Give it two file names and both files are compared. If the files are identical, diff will not print anything.
diff file-old.ext file-new.ext
You can use the command line parameter -u to get a unified diff which is somewhat more beautiful and readable. To search recursively, use the paramter -r (Note: both items given to diff must be directories then).
diff -ru /path/to/old/ /path/to/new/

File Patch: patch

patch uses the output of diff -u to apply a difference to a file or directory tree. This is used to transfer small changes of huge files or directory trees.

Filesystem Mounting: mount and umount

In Unix there are not drive letters but there is a single directory tree. Different partitions, floppies, USB sticks, ... are mounted to various directories (so called mount points). Every directory can be a mount point, even directories with files and subdirectories inside. These are hidden while it is used as a mount point. You mount a partition with
mount -t fstype /dev/xxx /path/to/mount/point/
where fstype is the file system type (e.g. vfat. ext3, reiserfs) and /dev/xxx is the device node (e.g. /dev/hda1, /dev/sdc1, ...).

Execute mount without parameters to show all currently mounted volumes.

To release a mounted volume, use the command umount (note the missing n!).
umount /path/to/mount/point/
This will fail if the volume is still used, i.e. any program has a file or directory on the volume opened or the current working directory of any program is inside the mounted volume.

Kernel Messages: dmesg

The Linux kernel has an internal buffer to store debug and status messages. This buffer can be displayed with dmesg. Usually only the bottom of its output is of interrest.

Text Tools I: cat, head and tail

To display the content of a text file, use cat.
cat filename
This is not an animal but the short form of concatenation. This comes from the usage of this program to concatenate several text files.
cat filename1 filename2 filename3 > totalfile

Text Tools II: sort, join

A text file can be sorted with sort.

join is used to join two sorted text files side by side by a key. Think of a file with the content
1 Eins
2 Zwei
3 Drei
4 Vier
5 Fünf
and another file with the content
1 One
2 Two
3 Three
4 Four
5 Five
then these two files can be joined to
1 Eins One
2 Zwei Two
3 Drei Three
4 Vier Four
5 Fünf Five
by the command
join file1 file2

Text Tools III: tr and sed

To replace certain characters in a file (or data stream), use tr.
tr 'abc' 'ABC' < filein > fileout
Note that tr can't use file names so you have to redirect its stdin and stdout. The above line translates every 'a' to an 'A', every 'b' to a 'B' and every 'c' to a 'C'.

The Stream EDitor sed is way more powerfull than tr. It supports a stream editing programming language which is explained detailed in its man page and lots of books. We just discuss how to substitute strings inside of a stream or text file.
sed -r 's/pattern/replacement/' < filein > fileout
The pattern is any valid regular expression. If you paranthesize certain parts of the pattern, e.g. ^[[:space:]]*([a-z]+).*$ this part can be referred too in the replacement with \1, \2 and so on.

Pager: more and less

When using cat to display the content of a file or when a tool prints lots of text to the screen it is inconvenient to use the terminals scrolling capability to scroll back. Therefore a pager is used. The file or stram is read by the pager and displayed page by page.

The first available pager was more. By pressing the [Space] key it forwards (screen) page by page. The key [Return] forwards line by line. Note that you can not scroll back upwards nor search the test.

This limitation is removed by less (note the pun). You can scroll with the arrow keys, [PgUp], [PgDn], [Home], [End]. To search simply type [/] and then the phrase. All matching words are marked and you can jump from one to the next with the [N] key. Type [h] for less' internal help.

Use your favourite pager with
less filename
or
program_with_lots_of_output params | less
Less accepts the option -S (capital S). Then it doesn't wrap long lines on the screen.

BTW: When viewing man pages, you are using your $PAGER (usually less)

Other Tools: ls, find, locate, sleep

To list the content of a directory, use ls (short form of list, unix guys are lazy :-) ). It accepts many parameters. The most usefull are
ParameterEffect
-a display all files, including hidden files
-A display almost all files, including hidden files except . and ..
-l display in long format
-1 display one file name per line
-d display the directory name instead of its content
-S sort by file size
-t sort by file date
-r reverse sort order
--color=auto color file according to their type

With find you can find a file which matches certain conditions in the current and all deeper directories. Usage:
find . -iname 'pattern'
where pattern is a glob pattern. Find can also search for certain file types (e.g. only directories), file sizes, dates, ... Please refer to its extensive man page for the options.

To search a file on the whole hard disk, find is very slow. Therefore a find index is stored somewhere and locate is used to search through it. This index is sometimes outdated, but global files should not change too often. locate is a substring search in the index, you can't use globs or regular exprssions. -i searches case independent.

sleep just waits for a number of seconds (here: 3.2 seconds)
sleep 3.2
You can use integer and float values.

File Tools I: cp, mv, rm, cd, pwd, mkdir, rmdir, chown, chmod,

CommandPurpose
cp copy a file, -r is recursive
mv move or rename a file, this is always recursive
rm remove a file (but no directorry), -r is recursive
cd change the current woring directory
pwd print the current (present) working directory
mkdir create a new directory, -p is recursive
rmdir delete an empty directory, use rm -r for non-empty directories
chown change the owner (and group) of a file, usage: chown user:group filename, -R is recursive
chmod change the access permissions of a file, usage: chmod 0644 filename, -R is recursive

Process Tools: ps, kill, killall

To list running processes, use
ps
To see all processes and more information, use
ps faxuw
Terminate a process with
kill pid
with the process' process ID (PID). This sends the "SIGTERM" signal to the process which can then exit gracefully (write edit buffer to to file, ...). If the process has crashed badly and doesn't react on the SIGTERM (or it intentionally ignores the signal) you can kill it with
kill -9 pid
which sends the SIGKILL signal. The process cannot ignore this signal and can not exit gracefully.

kill always needs the PID, which is tedious to find out using ps. killall accepts the process name instead
killall myprogram
You can use the -9 option here too.

Network Sockets: netstat

The program netstat shows all open network connections. Use the option '-n' to avoid reverse address lookups (IP -> hostname), '-a' to include listening sockets, '-t' to only show the TCP connections, '-u' to only show the UDP connections, and '-p' to include the program which is using the connection.
netstat -anpt

Network Scanner: nmap

With nmap the open TCP and UDP ports of a remote host can be scanned.
nmap hostname

Network Sniffer: tcpdump and Ethereal

With tcpdump the traffic at a network card of the particular machine is sniffed and printed (beautifully) to the screen. You can supply filters to pick up only the traffic you are interrested in
tcpdump -n
tcpdump -n -i eth1
tcpdump -n udp
tcpdump -n icmp
tcpdump -n not port 22
tcpdump -n tcp and port 80
Use Wireshark as a graphical network sniffer and protocol analyzer. You can also use tcpdump on a remote machine to store the sniffed packets into a file ('-w' option) and then load this file on your local machine with Wireshark for an offline analysis.

Editors: joe, vim, nano, kedit, kate, gedit, gvim, xemacs

My favourite editor is joe, because it is similar to WordStar (i.e. old Turbo Pascal editors). Type [Ctrl]-[K] [H] to fade in its help. nano is similar to the non-free pico. The most feature-rich editor is vim (V IMproved), but it is rather tedious to lern. All these editors are pure text editors executed in the terminal window.

kedit is the default editor of the KDE Desktop Environment. I recomend kate which is more powerful, especially for program development.

Gnome's default editor is gedit. gvim is a graphical frontend to vim.

For editing VHDL files I strongly recommend XEmacs with its powerful VHDL mode.

User Management: su, getent, w, id, whoami, last

To change the current user, use su (Substitute User)
su - username
Only ''root' can do this without knowing the other users password. If you want to get root, you can simply omit the username.

The program getent displays entries of the administrative databases. Usage
getent database [key ...]
where database is one of passwd, group, hosts, services, protocols, or networks.

With w you can determine who is logged in and what he is currently doing.

id shows the current UID and GIDs.

whoami shows the current user's username.

With last all previous logins and logouts are listed (up to a certain point in the past). This is read from the file /var/log/wtmp. Programms like login, su, ssh, ... append entries to this (binary) file.

Misc: mknod, mkfifo, strings, file,

To create device special files (usually in /dev/) use mknod. For named pipes you need mkfifo.

The program strings filters all text strings from a binary (e.g. executable) file.

The command file uses magic to determine the type of a file (see also man magic). In Unix file types usually are not dependent on their file name (extension) but only on file content.

Filesystem: df, du

With df (Disk Free) all mounted partitions are shown including the used and free space. The numbers are blocks, usually 1kiByte blocks.

du (Disk Usage) displays the size of a directory and all its subdirectories.

Packing: zip, unzip, tar, gzip, bzip2

CommandPurpose
zip compress files/directories into one zip archive, Usage:
zip archive.zip file1 file2 *.txt
zip -r archive2.zip src/
unzip uncompress an archive into the current directory, -l just lists the content of a zip file
tar Tape ARchive, similar to zip/unzip, Compress: tar cvfz archive.tar.gz files..., eXtract: tar xvfz archive.tar.gz, Test (=list): tar tvfz archive.tar.gz, internally calls gzip, replace the 'z' by a 'j' (and '.gz' by '.bz2') to use bzip2.
gzip compress single files with the GZip algorithm, every file gets is compressed and gets the (additional) extension '.gz'
gunzip uncompress .gz files
bzip2 compress single files with the BZip2 algorithm, every file gets is compressed and gets the (additional) extension '.gz'
bunzip2 uncompress .bz2 files

Calculator: bc

Usage:
bc -l
(-l enables floating point calculation) Type in a formula and get the result.

Yet Another Bash Introduction


A shell is the command line user interface where you type in. It provides the prompt, e.g.
username@pc87 ~ $
where you type commands. These commands are then executed. The first shell was called sh. Many different improvements have been published, like ksh, csh, tcsh, zsh, bash, ash, dsh, ... The bash nowadays is the most commonly used shell on Linux. For a complete documentation please refer to its amazingly extensive man page (on my machine it has 5375 (!) lines).

User Interface

The bash uses readline for a comfortable look and feel. readline is a library also used by other programs for keyboard input. It gives you many features over a simple input line. With the arrow keys [Left] and [Right] you can move the cursor back and forth. [Backspace] and [Del] are used to delete characters before or at the cursor, respectively. Use the arrow keys [Up] and [Down] to get command lines you have previously typed (so called history).

A very impressive feature of the bash is file name completion. Just type the beginning of a file name and then press the [Tab] key. The file name will be completed. If it is not unique, the shell will only complete the common part and beep. Pressing [Tab] twice gives you a list of all possibilities. This command line completion is very smart and extensible. Try it and you will miss it whenever it is not available.

Another very useful feature is the (reverse) history search with the ^R key combination. Just type [Ctrl]+[R] and then any part of a previously entered command line. This will be presented. Pressing [Enter] immediately invokes this command again. You can also use the arrow keys to modify the old command line before execution.

To get the last argument (the last word of the previous history entry) for your current command line use [Alt]+[.]. Successive usage move back through the history list, inserting the last argument of each line in turn.

There are some useful navigation key combination:
KeyFunction
[Alt]+[F] move cursor one word forward
[Alt]+[B] move cursor one word bbackward
[Ctrl]+[K] delete until end of input line
[Ctrl]+[U] delete everything before of the cursor
[Ctrl]+[C] don't execute command line and print a new prompt
[Ctrl]+[L] clear the screen keeping the command line

Note that some keys are not transmitted properly via some terminal connections and thus don't work.

Pressing [Ctrl]-[C] will terminate the currently running program.

Shell commands I: history, alias, echo

With the builtin command history all previous command lines are shown. Usually only the last 500 are stored when the shell is exited into the file .bash_history.

The builtin command alias is used to create command aliases. Use it e.g. for
alias dir='ls -l --color=auto'
if you like the command dir.

The builtin command echo just prints its parameters to stdout. It always prints a newline at the end. This can be revoked with the -n option. Also have a look at the builtin printf which is more powerful than echo.

Environment Variables

Additionally to the command line, every program receives its environment variables. These can be defined in the bash by
VARIABLE=content
Usually these variables are not inherited by any executed program. To export a variable to all future child processes, use
export VARIABLE
You can combine the assignment and the export statement
export VARIALBE=content
Important and commonly used variables are
EDITOR your favorite editor, this is used by svn, crontab -e, ...
PAGER your favorite pager, usually less, this is used by man, ...
PATH a list of all directories where executables should be searched
LD_LIBRARY_PATH a list of all directories where libraries should be searched
DISPLAY the display for X programs
HOME the current user's home directory
IFS the Internal Field Separator that is used for word splitting after expansion, only used internally by the bash
USER the currently logged in user
UID his user ID
TERM the terminal type you are using


Environment variables can also be used in command lines with the $ prefix. Usage:
echo $PATH
echo ${PATH}
The curly braces { } are optional but recommended and obligatory when you concatenate with certain characters, e.g. echo "I'm the ${UID}th user!".

Most environment variables which are a list (e.g. ${PATH}, ${LD_LIBRARY_PATH}, ...) are a colon separated list (':').

With environment variables you can even do string processing.
${parameter:-word} Use Default Values. If parameter is unset or null, the expansion of word is substituted.
${parameter:offset}
${parameter:offset:length} Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset.
${#parameter} The length in characters of the value of parameter is substituted.
${parameter#pattern} Delete beginning of parameter matched by pattern
${parameter%pattern} Delete end of parameter matched by pattern
${parameter/pattern/string} Substitute pattern by string in parameter


Command line expansion

When you enter a command line it is preprocessed by the bash before the command is executed. All environment variables are substituted by their value. All file name wildcards are expanded. That means that if you type
ls *.txt
the command ls does not get the string '*.txt' but it gets the expanded list of files matching the wildcard, e.g. file1.txt file2.txt file3.txt. Try this with echo *.txt.

Note that all command line parameters are split up (using ${IFS}) and then supplied as a list (remember int main(int argc, char* argv[])). Thus, the command
echo This      is a       string with lots     of    spaces
will print
This is a string with lots of spaces
To have a string as a single argument, put it between double quotes, e.g.
echo "This      is a       string with lots     of    spaces"
Double qoutes also prevent file name expansion but still allow environment variable substitution. Try this with
echo "${HOME} = *"
To specify a string without environment variable substitution, use single qoutes '', e.g.
echo '${HOME} = *'

Command execution

When a command is entered, the bash first tries to apply an alias. Then it checks if the command is a builtin. To get a list of all builtins, use
help
With help command you get a short description of the particular command. If the command is not a builtin, the bash searches the ${PATH} for an executable with the name. Note that it does not search the current directory! If you also want the current directory to be searched, add it to the path with export PATH=${PATH}:.

Every program returns an 8 bit exit code (the return value of its int main() routine). Contrary to C a value of 0 means true (no error) and any other value is false (an error has occurred). The latest exit code is always available in the special environment variable $?.

Control Operators: ; && || ( )

To execute more than one command in a single command line, you can use the control operators. Use the ';' operator to execute the commands one after another. If you want the following command to to depend on the exit code of the previous, use && or ||. If the exit code is 0 (i.e. no error), && will execute the following command, otherwise it stops. If you want the following command to be executed on a non-zero exit code (i.e. on an error), use ||.
grep -qs pattern && echo "The file contains the pattern!"
grep -qs pattern || echo "The file doesn't contain the pattern!"
You can execute a sub-shell by putting a list of commands in parenthesis. This is especially useful in shell scripts
(
  echo "First line"
  cat /path/to/file
  grep pattern /path/to/*
  echo "Last line" 
) > outfile

Processes: &, ^Z, bg, fg, jobs

When a program is executed, it takes over control of the console. That means that all input to the console is directed to the program and the shell is paused. If you want a program to be executed in the background (with detached stdin), append its command line by the & operator.
xclock &
This is only useful for programs without interactivity at the console like most X programs and daemons. Note that the program's output is still printed to the console.

If you forgot to append the & operator, you can interrupt any program by pressing [Ctrl]+[Z] at the console. The program is put on hold and the prompt of the shell is displayed. The type bg to continue execution of the program in the background. To continue execution in the foreground, enter fg. With jobs you get a list of all currently running or suspended processes of the current shell.

Redirection

Every program uses stdin, stdout and stderr for input and output. These file descriptors can be redirected by the bash. To redirect the output of a program (stdout) to a file, use the '>' character
echo abc > filename
after the command and all its parameters. To append to an existing file, use '>>'.
echo abc >> filename
To redirect stderr, use
grep pattern /* 2> filename
To redirect stdin use the '<' character
tr abc ABC < filein > fileout
after the command and all its parameters.

To redirect the output of a program to the input of another program (pipe) use the '|' character:
tr abc ABC < filein | grep X | grep -v y
You can make long chains of commands connected through these pipes. The data is passed from one program to the next.

Command Substitution

The output of a program can even be redirected to the command line of another program with back quotes or the $(command) syntax (the latter is recommended), e.g.
grep pattern $(find . -iname '*.c')

Mathematics

To calculate (with integers) use the following expression
echo "Next usable user ID is $(( $UID + 1 ))"

Control Structures: if, case, while, for, function

The bash is even more powerful, similar to full features programming languages. Use control structures like if
if [ -f /etc/passwd ] ; then cat /etc/passwd ; else echo "not found" ; fi
It wants a true value (i.e. exit code 0) to execute the first branch. Note the fi (reverse of if) as endif tag.
The command '[ ... ]' is a builtin and is equivalent to the builtin test. Its option '-f' checks whetjer the given file exists and is a regular file (see help test). It returns an exit code of 0 if the condition is true.

A case looks like this
case ${VAR} in
  pattern1)
    commands
    ;;
  pattern2)
    commands
    ;;
esac
Note the ;; which are obligatory. The pattern are glob patterns. Note the esac (reverse of case) as end, Unix programmers are lazy but like to make fun.

This while loop
while true ; do ls -l filename ; sleep 1 ; done
shows the file (including its size) and then sleeps for 1 second. Then the loop is repeated (infinitely). Press [Ctrl]-[C] to terminate the loop. A loop to utilize 100% CPU looks like this
while : ; do : ; done
':' is equivalent to true and does nothing except returning an exit code of 0.

Another loop type are for loops. Contrary to popular programming languages they don't count an integer value but they iterate through a list.
for i in *.txt ; do echo "I found a file named $i" ; done
Here the *.txt is expanded to the list of all matching files. You can also use command substitution
for i in $(find . -iname '*.txt') ; do echo "I found a file named $i" ; done
If you need a list of number, use the seq program
for i in `seq 1 12` ; do echo $i ; done
Note: Never forget the do in loops, then for an if and in for a case.

Scripts

Shell scripts have the following structure
#!/bin/bash
#
# Description, author, date, ...
#

commands
The first line always has to be this (so called shebang), because it tells the kernel to execute the script with the interpreter /bin/bash. Comments are introduced by the # character and can start at the begin of the line and every later place. All commands are equivalent to the ones described above. You can put commands in new lines instead of separating them by ;.

Within shell scripts you have some more predefined environment variables.
$0 the path to the script as it was invoked
$1, $2, ... first, seconds, ... command line parameter, see also help shift
$# count of command line parameters
$* All command line parameters. When the expansion occurs within double quotes, it expands to a single word with the value of each parameter separated by the first character of the IFS special variable.
$@ All command line parameters. When the expansion occurs within double quotes, each parameter expands to a separate word.
$? status of the most recently executed foreground process
$$ PID of the current shell (i.e. script)

Yet Another Subversion Introduction

Concept

Subversion organizes your data at two distinct places. The server holds the repository. This contains the full set of files. For every file its current (i.e. most up to date) version is kept. Additionally the full history (i.e. all previous versions) of every file is kept. Usually you will be interested in the current version of all files, but at certain circumstances (e.g. doing a release, correcting bugs in such releases, comparing your changes to previous versions, ...) you will need the old versions of the files.

Everybody who wants to work with these files has to check out his own working copy from the repository. These are normal files at your hard disk where you can work on with every program (e.g. editor, ...) you want. When you have done your changes to the files, you commit these changes back to the repository. Thats how a new version of several files is generated.

It is important to know, that SVN numbers these versions with a progressive integer starting from 0. This number is called the revision. All current files within the repository have the same revision number. This is different to SVN's predecessor CVS which had a separate revision number for every file.

But what happens when several people have checked out their working copies and do independent changes. They commit their changes at different times. But SVN will not let the later one commit his changes. First he has to update his working copy and it is SVN's job to merge all these changes of the other people into the working copy. Usually this works fine, even if two persons make changes to the same file, as long as these changes occur at different places in this file.

Now think of your working copy. When somebody commits his changes to the repository, your working copy will not be notified of this repository change and thus gets outdated. This is what the SVN update command is for. It retrieves the must up-to-date revision from the repository and merges these change into your working copy. Note that update is the smartest subcommand of SVN. It will not destroy your own changes.

Background info: Your working copy stores every file twice on your harddisk. Once for you and a second time in a hidden directory ( .svn/text-base/filename.ext.svn-base). The disadvantage is that a working copy uses twice the disk space as you expected. The advantage is that no network connection is necessary to compare your changes to the originally checked-out revision.

CLI Frontend

There exist several front ends for SVN. Here only the command line interface is described. SVN can give you help in any situaion. Simply type
svn help
You can even get info about every subcommand like
svn help checkout

Several subcommands exist:

CommandDescription
svn checkout [svnpath] get a working copy from the repository or part of it (1) (5) (6)
svn update update your working copy with the newest revision from the repository (5) (7)
svn status compare your working copy with the repository (exactly: the copy in the hidden directory) and show which files you have modified, added, removed, ... (7)
svn diff [file] compare a file to its hidden twin
svn commit upload your changes to the server and integrate them into the repository, a new revision is created (5) (7)
svn add [file] add a files or directories (recursively) to the version control (2)
svn mkdir [dir] create a new directory (2)
svn cp [file1] [file2] copy a file including its history (2) (3)
svn mv [file1] [file2] move a file maintaining its history (2) (4)
svn rm [file] remove a file from the current revision (2)

Some notes:

  1. You can also checkout only a part of the repository by specifying a subdirectory.
  2. Never directly create, rename or remove files or directories. Always use the SVN commands for that. Otherwise you will have troubles committing these changes. The only exception is creating new files. Just create it with the program you like and then add it to the version control.
  3. SVN can do so called cheap copies. that means that a copy of a file only uses very little space in the repository. Such copies even retain the full history of that files and directories. It is stored as just a pointer to the single instance of the original file. On the other hand, as soon as you modify the copy and commit these changes, a new trail (in terms of history) for this files is created. Therefore the original files will remain at their current states. The copies will have the full history (from when it was a single file) up until the most current file.
  4. A move (rename) is nothing else than a (cheap!) copy (maintaining the full history) followed by removing the original file.
  5. Only the subcommands "checkout", "update" and "commit" connect to the repository (i.e. server). All other commands only modify the working copy or refer to the locally stored copies of every file.
  6. Except for the "checkout" subcommand you never need the repository path (e.g. https://svn.server.tld/path/to/repository). This is stored in the hidden metadata files.
  7. You can also do an "update", "status", "commit", ... for only a subtree of your repository/working copy. Simply change your current working directory (cd) to a sub-path and invoke the desired command.
  8. When committing you are asked to enter a comment about what you have changed. Always describe your changes!!!. On Windows (TortoiseSVN) you have a dialog box with a large edit box for this comment. In Unix your default editor (defined by the environment variable EDITOR or vi as default) is launched and you can type and edit your comment. Note: vi is quit by hitting [Esc] and then typing ':wq' [Enter].

Line Endings

In plain text files (e.g. program source) the line ends are marked with special control characters. Unfortunately they are not the same across different operating systems. Unix uses the plain LF character. Windows uses CR-LF and the Mac uses CR. If your project is developed with different platforms you have to take care about that, the so called end of line (EOL) style.

Imagine a C++ source file is created on a Linux box and thus uses LF as EOL. When this file is checked out on Windows and the editor converts the EOLs to CR-LF, every single line looks changed (due to the additional CR character). This is bad practice and will disturb the usage of SVN.

To circumvent this problem, SVN can help you by doing checkout, commit and update using the native EOL-style. So in the above example the C++ file will have CR-LF when checked out on Windows, although it was saved with LF on a Linux box. When changes done on the Windows box are updated to the Linux box, their line endings will be converted to LF.

To get this assistance from SVN you have to set the property svn:eol-style to the value native for all your source files.

Linux

You can do this on Unix systems using the following command:
svn propset svn:eol-style native *.cc *.h

Windows (on existing files)

  1. right-click on the file(s)
  2. select TortoiseSVN
  3. select Properties
  4. select Add
  5. select property svn:eol-style
  6. write in the Property value native
  7. ok

Windows (on all future files)

You can do this on Windows with TortoiseSVN doing:
  1. right-click on any file
  2. select TortoiseSVN
  3. select Settings
  4. select the general tab
  5. click on edit button
  6. add after the following
    ### Section for configuring automatic properties.
    [auto-props]
    this:
    *.c = svn:eol-style=native
    *.cc = svn:eol-style=native
    *.cpp = svn:eol-style=native
    *.h = svn:eol-style=native
    *.txt = svn:eol-style=native
    *.png = svn:mime-type=image/png
    *.pdf = svn:mime-type=application/pdf
    *.jpg = svn:mime-type=image/jpeg
    Makefile = svn:eol-style=native