Key Points
Automated Version Control |
|
Setting Up Git |
|
Creating a Repository |
|
Tracking Changes |
|
Exploring History |
|
Ignoring Things |
|
Remotes in GitHub |
|
Collaborating |
|
Conflicts |
|
Summary of Basic Commands
Action | Files | Folders |
---|---|---|
Inspect | ls | ls |
View content | cat | ls |
Navigate to | cd | |
Move | mv | mv |
Copy | cp | cp -r |
Create | nano | mkdir |
Delete | rm | rmdir, rm -r |
Filesystem hierarchy
The following is an overview of a standard Unix filesystem. The exact hierarchy depends on the platform. Your file/directory structure may differ slightly:
Glossary
- absolute path
- A path that refers to a particular location in a file system. Absolute paths are usually written with respect to the file system’s root directory, and begin with either “/” (on Unix) or “\” (on Microsoft Windows). See also: relative path.
- argument
- A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
- command shell
- See shell
- command-line interface
- A user interface based on typing commands, usually at a REPL. See also: graphical user interface.
- comment
- A remark in a program that is intended to help human readers understand what is going on,
but is ignored by the computer.
Comments in Python, R, and the Unix shell start with a
#
character and run to the end of the line; comments in SQL start with--
, and other languages have other conventions. - current working directory
- The directory that relative paths are calculated from;
equivalently,
the place where files referenced by name only are searched for.
Every process has a current working directory.
The current working directory is usually referred to using the shorthand notation
.
(pronounced “dot”). - file system
- A set of files, directories, and I/O devices (such as keyboards and screens). A file system may be spread across many physical devices, or many file systems may be stored on a single physical device; the operating system manages access.
- filename extension
- The portion of a file’s name that comes after the final “.” character.
By convention this identifies the file’s type:
.txt
means “text file”,.png
means “Portable Network Graphics file”, and so on. These conventions are not enforced by most operating systems: it is perfectly possible (but confusing!) to name an MP3 sound filehomepage.html
. Since many applications use filename extensions to identify the MIME type of the file, misnaming files may cause those applications to fail. - filter
- A program that transforms a stream of data. Many Unix command-line tools are written as filters: they read data from standard input, process it, and write the result to standard output.
- flag
- A terse way to specify an option or setting to a command-line program.
By convention Unix applications use a dash followed by a single letter,
such as
-v
, or two dashes followed by a word, such as--verbose
, while DOS applications use a slash, such as/V
. Depending on the application, a flag may be followed by a single argument, as in-o /tmp/output.txt
. - for loop
- A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.
- graphical user interface
- A user interface based on selecting items and actions from a graphical display, usually controlled by using a mouse. See also: command-line interface.
- home directory
- The default directory associated with an account on a computer system. By convention, all of a user’s files are stored in or below her home directory.
- loop
- A set of instructions to be executed multiple times. Consists of a loop body and (usually) a condition for exiting the loop. See also for loop and while loop.
- loop body
- The set of statements or commands that are repeated inside a for loop or while loop.
- MIME type
- MIME (Multi-Purpose Internet Mail Extensions) types describe different file types for exchange on the Internet, for example, images, audio, and documents.
- operating system
- Software that manages interactions between users, hardware, and software processes. Common examples are Linux, macOS, and Windows.
- parameter
- A variable named in a function’s declaration that is used to hold a value passed into the call. The term is often used interchangeably (and inconsistently) with argument.
- parent directory
- The directory that “contains” the one in question.
Every directory in a file system except the root directory has a parent.
A directory’s parent is usually referred to using the shorthand notation
..
(pronounced “dot dot”). - path
- A description that specifies the location of a file or directory within a file system. See also: absolute path, relative path.
- pipe
- A connection from the output of one program to the input of another. When two or more programs are connected in this way, they are called a “pipeline”.
- process
- A running instance of a program, containing code, variable values, open files and network connections, and so on. Processes are the “actors” that the operating system manages; it typically runs each process for a few milliseconds at a time to give the impression that they are executing simultaneously.
- prompt
- A character or characters display by a REPL to show that it is waiting for its next command.
- quoting
- (in the shell):
Using quotation marks of various kinds to prevent the shell from interpreting special
characters.
For example, to pass the string
*.txt
to a program, it is usually necessary to write it as'*.txt'
(with single quotes) so that the shell will not try to expand the*
wildcard. - read-evaluate-print loop
- (REPL): A command-line interface that reads a command from the user, executes it, prints the result, and waits for another command.
- redirect
- To send a command’s output to a file rather than to the screen or another command, or equivalently to read a command’s input from a file.
- regular expression
- A pattern that specifies a set of character strings. REs are most often used to find sequences of characters in strings.
- relative path
- A path that specifies the location of a file or directory with respect to the current working directory. Any path that does not begin with a separator character (“/” or “\”) is a relative path. See also: absolute path.
- root directory
- The top-most directory in a file system. Its name is “/” on Unix (including Linux and macOS) and “\” on Microsoft Windows.
- shell
- A command-line interface such as Bash (the Bourne-Again Shell) or the Microsoft Windows DOS shell that allows a user to interact with the operating system.
- shell script
- A set of shell commands stored in a file for re-use. A shell script is a program executed by the shell; the name “script” is used for historical reasons.
- standard input
- A process’s default input stream. In interactive command-line applications, it is typically connected to the keyboard; in a pipe, it receives data from the standard output of the preceding process.
- standard output
- A process’s default output stream. In interactive command-line applications, data sent to standard output is displayed on the screen; in a pipe, it is passed to the standard input of the next process.
- sub-directory
- A directory contained within another directory.
- tab completion
- A feature provided by many interactive systems in which pressing the Tab key triggers automatic completion of the current word or command.
- variable
- A name in a program that is associated with a value or a collection of values.
- while loop
- A loop that keeps executing as long as some condition is true. See also: for loop.
- wildcard
- A character used in pattern matching.
In the Unix shell,
the wildcard
*
matches zero or more characters, so that*.txt
matches all files whose names end in.txt
.
External references
Opening a terminal
- How to Use Terminal on a Mac
- Git for Windows
- How to Install Bash shell command-line tool on Windows 10
- Install and Use the Linux Bash Shell on Windows 10
- Using the Windows 10 Bash Shell
- Using a UNIX/Linux emulator (Cygwin) or Secure Shell (SSH) client (Putty)
Manuals
Miscellaneous
- North Pacific Gyre
- Great Pacific Garbage Patch
- ‘Ensuring the longevity of digital information’ by Jeff Rothenberg
- Computer error haikus
- How to name files nicely, by Jenny Bryan
Git Cheatsheets for Quick Reference
- Printable Git cheatsheets in several languages are available here (English version). More material is available from the GitHub training website.
- An interactive one-page visualisation about the relationships between workspace, staging area, local repository, upstream repository, and the commands associated with each (with explanations).
- Both resources are also available in other languages (e.g. Spanish, French, and more).
- “Happy Git and GitHub for the useR” is an accessible, free online book by Jenny Bryan on how to setup and use Git and GitHub with specific references on the integration of Git with RStudio and working with Git in R.
- Open Scientific Code using Git and GitHub - A collection of explanations and short practical exercises to help researchers learn more about version control and open source software.
Glossary
- changeset
- A group of changes to one or more files that are or will be added to a single commit in a version control repository.
- commit
- To record the current state of a set of files (a changeset) in a version control repository. As a noun, the result of committing, i.e. a recorded changeset in a repository. If a commit contains changes to multiple files, all of the changes are recorded together.
- conflict
- A change made by one user of a version control system that is incompatible with changes made by other users. Helping users resolve conflicts is one of version control’s major tasks.
- HTTP
- The Hypertext Transfer Protocol used for sharing web pages and other data on the World Wide Web.
- merge
- (a repository): To reconcile two sets of changes to a repository.
- protocol
- A set of rules that define how one computer communicates with another. Common protocols on the Internet include HTTP and SSH.
- remote
- (of a repository) A version control repository connected to another, in such way that both can be kept in sync exchanging commits.
- repository
- A storage area where a version control system stores the full history of commits of a project and information about who changed what, when.
- resolve
- To eliminate the conflicts between two or more incompatible changes to a file or set of files being managed by a version control system.
- revision
- A synonym for commit.
- SHA-1
- SHA-1 hashes is what Git uses to compute identifiers, including for commits. To compute these, Git uses not only the actual change of a commit, but also its metadata (such as date, author, message), including the identifiers of all commits of preceding changes. This makes Git commit IDs virtually unique. I.e., the likelihood that two commits made independently, even of the same change, receive the same ID is exceedingly small.
- SSH
- The Secure Shell protocol used for secure communication between computers.
- timestamp
- A record of when a particular event occurred.
- version control
- A tool for managing changes to a set of files. Each set of changes creates a new commit of the files; the version control system allows users to recover old commits reliably, and helps manage conflicting changes made by different users.