CAT(1) User Commands CAT(1)
NAME
cat - concatenate files and print on the standard output
SYNOPSIS
cat [OPTION]... [FILE]...
DESCRIPTION
Concatenate FILE(s), or standard input, to standard output.
-A, --show-all
equivalent to -vET
-b, --number-nonblank
number nonempty output lines
-e equivalent to -vE
-E, --show-ends
display $ at end of each line
-n, --number
number all output lines
-s, --squeeze-blank
suppress repeated empty output lines
-t equivalent to -vT
-T, --show-tabs
display TAB characters as ^I
-u (ignored)
-v, --show-nonprinting
use ^ and M- notation, except for LFD and TAB
--help display this help and exit
--version
output version information and exit
With no FILE, or when FILE is -, read standard input.
EXAMPLES
cat f - g
Output f’s contents, then standard input, then g’s contents.
cat Copy standard input to standard output.
AUTHOR
Written by Torbjorn Granlund and Richard M. Stallman.
REPORTING BUGS
Report cat bugs to bug-coreutils@gnu.org
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
Report cat translation bugs to <http://translationproject.org/team/>
COPYRIGHT
Copyright © 2010 Free Software Foundation, Inc. License GPLv3+: GNU
GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
The full documentation for cat is maintained as a Texinfo manual. If
the info and cat programs are properly installed at your site, the com-
mand
info coreutils 'cat invocation'
should give you access to the complete manual.
GNU coreutils 8.4 February 2017 CAT(1)
리눅스 기본명령어 - cat #
cat은 리눅스/유닉스 초보자들이 가장 많이 배우는 명령중 하나이다. cat은 concatenate 또는 catenate에서 따온 이름이다. cat 명령은 파일이름을 인자로 받아서 그 내용을 쭉 이어주는 역할을 한다. 결국 어떤 내용을 받아서 그냥 그대로 터미널 화면에 뿌려주는 역할을 한다.
$ cat [옵션] [파일명]
간단 사용법은 다음과 같다.
$ cat file
$ cat file1 file2 file3
즉 파일 한 개 이상의 내용을 화면에 출력할 때 이용한다. 한 화면을 넘어가더라도 그냥 쭉 계속해서 출력한다. 만약 내용이 한 화면이상이면 제대로 읽을 수 없기 때문에 보통 more, less 명령어와 같이 사용한다.
두번째 사용법은
$ cat file(s) > file2
파일 여러개를 합쳐서 하나의 큰 파일을 만들 때 사용한다. 여기서 주의할 점은 만들어 지는 새로운 파일은 기존에 있는 파일이 아니라 새로운 파일이어야 한다. 만약 기존에 있는 파일이라면 기존 파일을 덮어쓰게 된다. 즉 기존 내용은 삭제되고 만다.
$ cat file file2 file3 > file4
이와 같이 사용하면 해당 파일 세 개의 파일을 모두 합쳐서 새로운 file4로 만들어 주는 것이다. file, file1, file2의 내용은 기존 내용과 달라지지 않는다.
세번째 사용법은
$ cat file1 >> file2
이 경우에는 기존에 있는 file2에 file1의 내용을 덧붙여준다. 그러면 file1의 내용이 기존 파일 file2의 맨 끝에 붙여서 새로운 파일이 생성이 된다.
네번째 사용법은
$ cat > new_file
새로운 파일을 만들 때 사용한다. 이는 touch new_file과 같은 효과를 만들지만 이 경우에는 명령어를 입력한 후에 표준 입력으로 키보드에서 입력한 내용을 파일에 저장된다. 입력이 끝나게 되면 CTRL-d를 입력하면 새로운 내용이 저장된 새로운 파일이 만들어진다. 이 경우에는 초보자가 에디터 사용에 익숙하지 않을 경우에 사용하면 유용하다.
옵션 #
- -b: 줄번호를 화면 왼쪽에 나타낸다. 비어있는 행은 제외한다.
- -e: 제어 문자를 ^ 형태로 출력하면서 각 행의 끝에 $를 추가한다.
- -n: 줄번호를 화면 왼쪽에 나타낸다. 비어있는 행도 포함한다.
- -s: 연속되는 2개이상의 빈 행을 한행으로 출력한다.
- -v: tab과 행 바꿈 문자를 제외한 제어 문자를 ^ 형태로 출력한다.
- -E: 행마다 끝에 $ 문자를 출력한다.
- -T: 탭(tab) 문자를 출력한다.
- -A: -vET 옵션을 사용한 것과 같은 효과를 본다.
팁 #
출력할 파일의 내용이 너무 많아 한 화면에 다 보이지 않고 넘어가버리는 경우 more명령어와 함께 사용하면 조금더 읽기 편해진다.
$ cat file | more
TAIL(1) User Commands TAIL(1)
NAME
tail - output the last part of files
SYNOPSIS
tail [OPTION]... [FILE]...
DESCRIPTION
Print the last 10 lines of each FILE to standard output. With more
than one FILE, precede each with a header giving the file name. With
no FILE, or when FILE is -, read standard input.
Mandatory arguments to long options are mandatory for short options
too.
-c, --bytes=K
output the last K bytes; alternatively, use -c +K to output
bytes starting with the Kth of each file
-f, --follow[={name|descriptor}]
output appended data as the file grows; -f, --follow, and --fol-
low=descriptor are equivalent
-F same as --follow=name --retry
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K to
output lines starting with the Kth
--max-unchanged-stats=N
with --follow=name, reopen a FILE which has not changed size
after N (default 5) iterations to see if it has been unlinked or
renamed (this is the usual case of rotated log files). With
inotify, this option is rarely useful.
--pid=PID
with -f, terminate after process ID, PID dies
-q, --quiet, --silent
never output headers giving file names
--retry
keep trying to open a file even when it is or becomes inaccessi-
ble; useful when following by name, i.e., with --follow=name
-s, --sleep-interval=N
with -f, sleep for approximately N seconds (default 1.0) between
iterations.
With inotify and --pid=P, check process P at least once every N
seconds.
-v, --verbose
always output headers giving file names
--help display this help and exit
--version
output version information and exit
If the first character of K (the number of bytes or lines) is a ‘+’,
print beginning with the Kth item from the start of each file, other-
wise, print the last K items in the file. K may have a multiplier
suffix: b 512, kB 1000, K 1024, MB 1000*1000, M 1024*1024, GB
1000*1000*1000, G 1024*1024*1024, and so on for T, P, E, Z, Y.
With --follow (-f), tail defaults to following the file descriptor,
which means that even if a tail’ed file is renamed, tail will continue
to track its end. This default behavior is not desirable when you
really want to track the actual name of the file, not the file descrip-
tor (e.g., log rotation). Use --follow=name in that case. That causes
tail to track the named file in a way that accommodates renaming,
removal and creation.
AUTHOR
Written by Paul Rubin, David MacKenzie, Ian Lance Taylor, and Jim Mey-
ering.
REPORTING BUGS
Report tail bugs to bug-coreutils@gnu.org
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
Report tail translation bugs to <http://translationproject.org/team/>
COPYRIGHT
Copyright © 2010 Free Software Foundation, Inc. License GPLv3+: GNU
GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
The full documentation for tail is maintained as a Texinfo manual. If
the info and tail programs are properly installed at your site, the
command
info coreutils 'tail invocation'
should give you access to the complete manual.
GNU coreutils 8.4 February 2017 TAIL(1)
리눅스 기본명령어 - tail #
tail 명령어는 파일의 마지막 행을 기준으로 지정한 행까지의 파일내용 일부를 출력한다. 기본 값으로 마지막 10줄을 출력해 준다. 참고로 head 명령어의 경우에는 파일의 앞 부분을 출력한다. cat, more, less 등 파일의 내용을 출력하는 명령어와 다르게 tail은 리눅스에서 오류나 파일 로그를 실시간으로 확인할 때 매우 유용하게 사용된다.
$ tail [옵션] [파일]
명령어를 사용 시 옵션없이 사용할 경우, 파일의 끝부터 디폴트 값인 10개 행을 표준출력한다. 여러 개의 파일이 입력될 경우, 입력된 파일들에 대해 각각 10개 행씩 출력한다. 파일명이 입력되지 않으면 표준입력을 읽어들인다.
$ tail test.txt
디폴트 값이 아니라 지정한 행까지 출력을 할 경우 -n 옵션을 이용하여 아래와 같이 옵션 값을 입력한다. 반대로 파일의 특정 행부터 마지막 행까지 출력하고 싶은 경우에는 '+'를 이용한다.
$ tail -n 20 test.txt
$ tail +20 test.txt
행 단위 대신 byte를 기준으로 입력할 수도 있다.
$ tail -c 200 test.txt
오류나 파일 로그를 실시간으로 모니터링할 경우에는 -f 옵션을 사용한다. -f 옵션을 사용할 경우 파일의 마지막부터 10줄을 출력하며, 종료되지 않은채 표준입력을 읽어들여 출력해준다. 출력을 종료하고 싶은 경우에는 Ctrl + c 를 사용한다.
$ tail -f /var/log/messages.log
옵션 #
- -c, --bytes=K : 줄 단위가 아니라 bytes 단위로 파일의 마지막 부분을 출력한다. -c +K 와 같이 입력하면 파일의 시작부터 K번째 bytes까지 출력한다.
- -f, --follow[={name|descriptor}] : 파일의 마지막부터 10줄을 출력해주고 종료되지 않은채 표준입력을 읽어 들인다.
-f, --follow, --follow=descriptor
모두 동일한 의미를 가지기 때문에 어느 것을 사용해도 상관없다. - -F : 파일 변동 시 실시간으로 출력하되 로그파일처럼 특정 시간이 지난 후 파일이 변하게 되면 새로운 파일을 오픈하여 출력한다.
- -n, --lines=K : K 값을 입력할 경우 마지막 10줄 대신 마지막에서 K 번째 줄까지 출력한다.
- -q : 파일의 이름을 header에 출력하지 않는다.
- -s : -f 옵션과 함께 사용하며, N초(default 1.0)의 시간 간격마다 파일에 새로운 줄이 추가되었는지 검사한다.
- -v : 항상 파일의 이름을 header에 출력한다.
GREP(1) GREP(1)
NAME
grep, egrep, fgrep - print lines matching a pattern
SYNOPSIS
grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]
DESCRIPTION
grep searches the named input FILEs (or standard input if no files are
named, or if a single hyphen-minus (-) is given as file name) for lines
containing a match to the given PATTERN. By default, grep prints the
matching lines.
In addition, two variant programs egrep and fgrep are available. egrep
is the same as grep -E. fgrep is the same as grep -F. Direct
invocation as either egrep or fgrep is deprecated, but is provided to
allow historical applications that rely on them to run unmodified.
OPTIONS
Generic Program Information
--help Print a usage message briefly summarizing these command-line
options and the bug-reporting address, then exit.
-V, --version
Print the version number of grep to the standard output stream.
This version number should be included in all bug reports (see
below).
Matcher Selection
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see
below). (-E is specified by POSIX.)
-F, --fixed-strings, --fixed-regexp
Interpret PATTERN as a list of fixed strings, separated by
newlines, any of which is to be matched. (-F is specified by
POSIX, --fixed-regexp is an obsoleted alias, please do not use
it new scripts.)
-G, --basic-regexp
Interpret PATTERN as a basic regular expression (BRE, see
below). This is the default.
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression. This is highly
experimental and grep -P may warn of unimplemented features.
Matching Control
-e PATTERN, --regexp=PATTERN
Use PATTERN as the pattern. This can be used to specify
multiple search patterns, or to protect a pattern beginning with
a hyphen (-). (-e is specified by POSIX.)
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-i, --ignore-case
Ignore case distinctions in both the PATTERN and the input
files. (-i is specified by POSIX.)
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v
is specified by POSIX.)
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
-x, --line-regexp
Select only those matches that exactly match the whole line.
(-x is specified by POSIX.)
-y Obsolete synonym for -i.
General Output Control
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
--color[=WHEN], --colour[=WHEN]
Surround the matched (non-empty) strings, matching lines,
context lines, file names, line numbers, byte offsets, and
separators (for fields and groups of context lines) with escape
sequences to display them in color on the terminal. The colors
are defined by the environment variable GREP_COLORS. The
deprecated environment variable GREP_COLOR is still supported,
but its setting does not have priority. WHEN is never, always,
or auto.
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit
immediately with zero status if any match is found, even if an
error was detected. Also see the -s or --no-messages option.
(-q is specified by POSIX.)
-s, --no-messages
Suppress error messages about nonexistent or unreadable files.
Portability note: unlike GNU grep, 7th Edition Unix grep did not
conform to POSIX, because it lacked -q and its -s option behaved
like GNU grep’s -q option. USG-style grep also lacked -q but
its -s option behaved like GNU grep. Portable shell scripts
should avoid both -q and -s and should redirect standard and
error output to /dev/null instead. (-s is specified by POSIX.)
Output Line Prefix Control
-b, --byte-offset
Print the 0-based byte offset within the input file before each
line of output. If -o (--only-matching) is specified, print the
offset of the matching part itself.
-H, --with-filename
Print the file name for each match. This is the default when
there is more than one file to search.
-h, --no-filename
Suppress the prefixing of file names on output. This is the
default when there is only one file (or only standard input) to
search.
--label=LABEL
Display input actually coming from standard input as input
coming from file LABEL. This is especially useful when
implementing tools like zgrep, e.g., gzip -cd foo.gz | grep
--label=foo -H something. See also the -H option.
-n, --line-number
Prefix each line of output with the 1-based line number within
its input file. (-n is specified by POSIX.)
-T, --initial-tab
Make sure that the first character of actual line content lies
on a tab stop, so that the alignment of tabs looks normal. This
is useful with options that prefix their output to the actual
content: -H,-n, and -b. In order to improve the probability
that lines from a single file will all start at the same column,
this also causes the line number and byte offset (if present) to
be printed in a minimum size field width.
-u, --unix-byte-offsets
Report Unix-style byte offsets. This switch causes grep to
report byte offsets as if the file were a Unix-style text file,
i.e., with CR characters stripped off. This will produce
results identical to running grep on a Unix machine. This
option has no effect unless -b option is also used; it has no
effect on platforms other than MS-DOS and MS-Windows.
-Z, --null
Output a zero byte (the ASCII NUL character) instead of the
character that normally follows a file name. For example, grep
-lZ outputs a zero byte after each file name instead of the
usual newline. This option makes the output unambiguous, even
in the presence of file names containing unusual characters like
newlines. This option can be used with commands like find
-print0, perl -0, sort -z, and xargs -0 to process arbitrary
file names, even those that contain newline characters.
Context Line Control
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (described under
--group-separator) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (described under
--group-separator) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a
group separator (described under --group-separator) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
--group-separator=SEP
Use SEP as a group separator. By default SEP is double hyphen
(--).
--no-group-separator
Use empty string as a group separator.
File and Directory Selection
-a, --text
Process a binary file as if it were text; this is equivalent to
the --binary-files=text option.
--binary-files=TYPE
If the first few bytes of a file indicate that the file contains
binary data, assume that the file is of type TYPE. By default,
TYPE is binary, and grep normally outputs either a one-line
message saying that a binary file matches, or no message if
there is no match. If TYPE is without-match, grep assumes that
a binary file does not match; this is equivalent to the -I
option. If TYPE is text, grep processes a binary file as if it
were text; this is equivalent to the -a option. Warning: grep
--binary-files=text might output binary garbage, which can have
nasty side effects if the output is a terminal and if the
terminal driver interprets some of it as commands.
-D ACTION, --devices=ACTION
If an input file is a device, FIFO or socket, use ACTION to
process it. By default, ACTION is read, which means that
devices are read just as if they were ordinary files. If ACTION
is skip, devices are silently skipped.
-d ACTION, --directories=ACTION
If an input file is a directory, use ACTION to process it. By
default, ACTION is read, i.e., read directories just as if they
were ordinary files. If ACTION is skip, silently skip
directories. If ACTION is recurse, read all files under each
directory, recursively, following symbolic links only if they
are on the command line. This is equivalent to the -r option.
--exclude=GLOB
Skip files whose base name matches GLOB (using wildcard
matching). A file-name glob can use *, ?, and [...] as
wildcards, and \ to quote a wildcard or backslash character
literally.
--exclude-from=FILE
Skip files whose base name matches any of the file-name globs
read from FILE (using wildcard matching as described under
--exclude).
--exclude-dir=DIR
Exclude directories matching the pattern DIR from recursive
searches.
-I Process a binary file as if it did not contain matching data;
this is equivalent to the --binary-files=without-match option.
--include=GLOB
Search only files whose base name matches GLOB (using wildcard
matching as described under --exclude).
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. This is
equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
Other Options
--line-buffered
Use line buffering on output. This can cause a performance
penalty.
-U, --binary
Treat the file(s) as binary. By default, under MS-DOS and MS-
Windows, grep guesses the file type by looking at the contents
of the first 32KB read from the file. If grep decides the file
is a text file, it strips the CR characters from the original
file contents (to make regular expressions with ^ and $ work
correctly). Specifying -U overrules this guesswork, causing all
files to be read and passed to the matching mechanism verbatim;
if the file is a text file with CR/LF pairs at the end of each
line, this will cause some regular expressions to fail. This
option has no effect on platforms other than MS-DOS and MS-
Windows.
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.
REGULAR EXPRESSIONS
A regular expression is a pattern that describes a set of strings.
Regular expressions are constructed analogously to arithmetic
expressions, by using various operators to combine smaller expressions.
grep understands three different versions of regular expression syntax:
“basic,” “extended” and “perl.” In GNU grep, there is no difference in
available functionality between basic and extended syntaxes. In other
implementations, basic regular expressions are less powerful. The
following description applies to extended regular expressions;
differences for basic regular expressions are summarized afterwards.
Perl regular expressions give additional functionality, and are
documented in pcresyntax(3) and pcrepattern(3), but may not be
available on every system.
The fundamental building blocks are the regular expressions that match
a single character. Most characters, including all letters and digits,
are regular expressions that match themselves. Any meta-character with
special meaning may be quoted by preceding it with a backslash.
The period . matches any single character.
Character Classes and Bracket Expressions
A bracket expression is a list of characters enclosed by [ and ]. It
matches any single character in that list; if the first character of
the list is the caret ^ then it matches any character not in the list.
For example, the regular expression [0123456789] matches any single
digit.
Within a bracket expression, a range expression consists of two
characters separated by a hyphen. It matches any single character that
sorts between the two characters, inclusive, using the locale’s
collating sequence and character set. For example, in the default C
locale, [a-d] is equivalent to [abcd]. Many locales sort characters in
dictionary order, and in these locales [a-d] is typically not
equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example.
To obtain the traditional interpretation of bracket expressions, you
can use the C locale by setting the LC_ALL environment variable to the
value C.
Finally, certain named classes of characters are predefined within
bracket expressions, as follows. Their names are self explanatory, and
they are [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:],
[:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:].
For example, [[:alnum:]] means the character class of numbers and
letters in the current locale. In the C locale and ASCII character set
encoding, this is the same as [0-9A-Za-z]. (Note that the brackets in
these class names are part of the symbolic names, and must be included
in addition to the brackets delimiting the bracket expression.) Most
meta-characters lose their special meaning inside bracket expressions.
To include a literal ] place it first in the list. Similarly, to
include a literal ^ place it anywhere but first. Finally, to include a
literal - place it last.
Anchoring
The caret ^ and the dollar sign $ are meta-characters that respectively
match the empty string at the beginning and end of a line.
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the
beginning and end of a word. The symbol \b matches the empty string at
the edge of a word, and \B matches the empty string provided it’s not
at the edge of a word. The symbol \w is a synonym for [_[:alnum:]] and
\W is a synonym for [^_[:alnum:]].
Repetition
A regular expression may be followed by one of several repetition
operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times. This is a GNU
extension.
{n,m} The preceding item is matched at least n times, but not more
than m times.
Concatenation
Two regular expressions may be concatenated; the resulting regular
expression matches any string formed by concatenating two substrings
that respectively match the concatenated expressions.
Alternation
Two regular expressions may be joined by the infix operator |; the
resulting regular expression matches any string matching either
alternate expression.
Precedence
Repetition takes precedence over concatenation, which in turn takes
precedence over alternation. A whole expression may be enclosed in
parentheses to override these precedence rules and form a
subexpression.
Back References and Subexpressions
The back-reference \n, where n is a single digit, matches the substring
previously matched by the nth parenthesized subexpression of the
regular expression.
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and )
lose their special meaning; instead use the backslashed versions \?,
\+, \{, \|, \(, and \).
Traditional egrep did not support the { meta-character, and some egrep
implementations support \{ instead, so portable scripts should avoid {
in grep -E patterns and should use [{] to match a literal {.
GNU grep -E attempts to support traditional usage by assuming that { is
not special if it would be the start of an invalid interval
specification. For example, the command grep -E ’{1’ searches for the
two-character string {1 instead of reporting a syntax error in the
regular expression. POSIX allows this behavior as an extension, but
portable scripts should avoid it.
ENVIRONMENT VARIABLES
The behavior of grep is affected by the following environment
variables.
The locale for category LC_foo is specified by examining the three
environment variables LC_ALL, LC_foo, LANG, in that order. The first
of these variables that is set specifies the locale. For example, if
LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then the Brazilian
Portuguese locale is used for the LC_MESSAGES category. The C locale
is used if none of these environment variables are set, if the locale
catalog is not installed, or if grep was not compiled with national
language support (NLS).
GREP_OPTIONS
This variable specifies default options to be placed in front of
any explicit options. For example, if GREP_OPTIONS is
’--binary-files=without-match --directories=skip’, grep behaves
as if the two options --binary-files=without-match and
--directories=skip had been specified before any explicit
options. Option specifications are separated by whitespace. A
backslash escapes the next character, so it can be used to
specify an option containing whitespace or a backslash.
GREP_COLOR
This variable specifies the color used to highlight matched
(non-empty) text. It is deprecated in favor of GREP_COLORS, but
still supported. The mt, ms, and mc capabilities of GREP_COLORS
have priority over it. It can only specify the color used to
highlight the matching non-empty text in any matching line (a
selected line when the -v command-line option is omitted, or a
context line when -v is specified). The default is 01;31, which
means a bold red foreground text on the terminal’s default
background.
GREP_COLORS
Specifies the colors and other attributes used to highlight
various parts of the output. Its value is a colon-separated
list of capabilities that defaults to
ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with the rv
and ne boolean capabilities omitted (i.e., false). Supported
capabilities are as follows.
sl= SGR substring for whole selected lines (i.e., matching
lines when the -v command-line option is omitted, or non-
matching lines when -v is specified). If however the
boolean rv capability and the -v command-line option are
both specified, it applies to context matching lines
instead. The default is empty (i.e., the terminal’s
default color pair).
cx= SGR substring for whole context lines (i.e., non-matching
lines when the -v command-line option is omitted, or
matching lines when -v is specified). If however the
boolean rv capability and the -v command-line option are
both specified, it applies to selected non-matching lines
instead. The default is empty (i.e., the terminal’s
default color pair).
rv Boolean value that reverses (swaps) the meanings of the
sl= and cx= capabilities when the -v command-line option
is specified. The default is false (i.e., the capability
is omitted).
mt=01;31
SGR substring for matching non-empty text in any matching
line (i.e., a selected line when the -v command-line
option is omitted, or a context line when -v is
specified). Setting this is equivalent to setting both
ms= and mc= at once to the same value. The default is a
bold red text foreground over the current line
background.
ms=01;31
SGR substring for matching non-empty text in a selected
line. (This is only used when the -v command-line option
is omitted.) The effect of the sl= (or cx= if rv)
capability remains active when this kicks in. The
default is a bold red text foreground over the current
line background.
mc=01;31
SGR substring for matching non-empty text in a context
line. (This is only used when the -v command-line option
is specified.) The effect of the cx= (or sl= if rv)
capability remains active when this kicks in. The
default is a bold red text foreground over the current
line background.
fn=35 SGR substring for file names prefixing any content line.
The default is a magenta text foreground over the
terminal’s default background.
ln=32 SGR substring for line numbers prefixing any content
line. The default is a green text foreground over the
terminal’s default background.
bn=32 SGR substring for byte offsets prefixing any content
line. The default is a green text foreground over the
terminal’s default background.
se=36 SGR substring for separators that are inserted between
selected line fields (:), between context line fields,
(-), and between groups of adjacent lines when nonzero
context is specified (--). The default is a cyan text
foreground over the terminal’s default background.
ne Boolean value that prevents clearing to the end of line
using Erase in Line (EL) to Right (\33[K) each time a
colorized item ends. This is needed on terminals on
which EL is not supported. It is otherwise useful on
terminals for which the back_color_erase (bce) boolean
terminfo capability does not apply, when the chosen
highlight colors do not affect the background, or when EL
is too slow or causes too much flicker. The default is
false (i.e., the capability is omitted).
Note that boolean capabilities have no =... part. They are
omitted (i.e., false) by default and become true when specified.
See the Select Graphic Rendition (SGR) section in the
documentation of the text terminal that is used for permitted
values and their meaning as character attributes. These
substring values are integers in decimal representation and can
be concatenated with semicolons. grep takes care of assembling
the result into a complete SGR sequence (\33[...m). Common
values to concatenate include 1 for bold, 4 for underline, 5 for
blink, 7 for inverse, 39 for default foreground color, 30 to 37
for foreground colors, 90 to 97 for 16-color mode foreground
colors, 38;5;0 to 38;5;255 for 88-color and 256-color modes
foreground colors, 49 for default background color, 40 to 47 for
background colors, 100 to 107 for 16-color mode background
colors, and 48;5;0 to 48;5;255 for 88-color and 256-color modes
background colors.
LC_ALL, LC_COLLATE, LANG
These variables specify the locale for the LC_COLLATE category,
which determines the collating sequence used to interpret range
expressions like [a-z].
LC_ALL, LC_CTYPE, LANG
These variables specify the locale for the LC_CTYPE category,
which determines the type of characters, e.g., which characters
are whitespace.
LC_ALL, LC_MESSAGES, LANG
These variables specify the locale for the LC_MESSAGES category,
which determines the language that grep uses for messages. The
default C locale uses American English messages.
POSIXLY_CORRECT
If set, grep behaves as POSIX requires; otherwise, grep behaves
more like other GNU programs. POSIX requires that options that
follow file names must be treated as file names; by default,
such options are permuted to the front of the operand list and
are treated as options. Also, POSIX requires that unrecognized
options be diagnosed as “illegal”, but since they are not really
against the law the default is to diagnose them as “invalid”.
POSIXLY_CORRECT also disables _N_GNU_nonoption_argv_flags_,
described below.
_N_GNU_nonoption_argv_flags_
(Here N is grep’s numeric process ID.) If the ith character of
this environment variable’s value is 1, do not consider the ith
operand of grep to be an option, even if it appears to be one.
A shell can put this variable in the environment for each
command it runs, specifying which operands are the results of
file name wildcard expansion and therefore should not be treated
as options. This behavior is available only with the GNU C
library, and only when POSIXLY_CORRECT is not set.
EXIT STATUS
Normally, the exit status is 0 if selected lines are found and 1
otherwise. But the exit status is 2 if an error occurred, unless the
-q or --quiet or --silent option is used and a selected line is found.
Note, however, that POSIX only mandates, for programs such as grep,
cmp, and diff, that the exit status in case of error be greater than 1;
it is therefore advisable, for the sake of portability, to use logic
that tests for this general condition instead of strict equality
with 2.
COPYRIGHT
Copyright 1998-2000, 2002, 2005-2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
BUGS
Reporting Bugs
Email bug reports to <bug-grep@gnu.org>, a mailing list whose web page
is <http://lists.gnu.org/mailman/listinfo/bug-grep>. grep’s Savannah
bug tracker is located at <http://savannah.gnu.org/bugs/?group=grep>.
Known Bugs
Large repetition counts in the {n,m} construct may cause grep to use
lots of memory. In addition, certain other obscure regular expressions
require exponential time and space, and may cause grep to run out of
memory.
Back-references are very slow, and may require exponential time.
SEE ALSO
Regular Manual Pages
awk(1), cmp(1), diff(1), find(1), gzip(1), perl(1), sed(1), sort(1),
xargs(1), zgrep(1), read(2), pcre(3), pcresyntax(3), pcrepattern(3),
terminfo(5), glob(7), regex(7).
POSIX Programmer’s Manual Page
grep(1p).
TeXinfo Documentation
The full documentation for grep is maintained as a TeXinfo manual,
which you can read at http://www.gnu.org/software/grep/manual/. If the
info and grep programs are properly installed at your site, the command
info grep
should give you access to the complete manual.
NOTES
This man page is maintained only fitfully; the full documentation is
often more up-to-date.
GNU’s not Unix, but Unix is a beast; its plural form is Unixen.
User Commands GNU grep 2.20 GREP(1)
cat /var/log/messages | grep -i error
egrep -i 'error|fail|warn' /var/log/messages
리눅스 기본명령어 - grep #
grep 명령은 파일 내에서 지정한 패턴이나 문자열을 찾은 후에, 그 패턴을 포함하고 있는 모든 행을 표준 출력해 준다. 물론, 한 디렉토리 내에서 지정한 패턴을 포함하는 파일을 출력할 수도 있다. grep 명령은 하나 이상의 파일로부터 프로그램 수정 등을 위해 변수, 또는 함수명을 찾을 때 많이 사용된다.
$ grep [옵션] [찾을 문자열 정규 표현식] [파일명]
여기서 ‘문자열’은 사용자가 찾으려는 단어 또는 구를 나타내고 ‘파일명’은 검색 대상 파일이다. ‘문자열’은 하나 또는 여러 문자를 의미한다. 단일 문자도 단어나 문장과 같이 하나의 문자열이다. 문자열에는 “빈칸”, 구두점과 보이지 않는 (제어) 문자들이 포함될 수 있다. 파일명은 지정하면 해당 파일에서만 찾지만, 파일명을 * 와 같은 메타문자를 사용하면 현재 디렉토리의 모든 파일을 대상으로 하므로 디렉토리내의 모든 파일내에서 해당 문자열을 찾게된다. ***(추가)
$grep "run[-] time" ch04
이 구문은 ch04라는 파일에서 run-time 또는 run time이 들어 있는 행을 찾을 때 사용한다. 생물 데이터를 다루다보면 특정 유전자의 이름이 들어 있는 행 등 특정 키워드를 가진 행을 찾을 때 사용하면 된다.
$ps -aux | grep python
이 구문은 현재 작업 중인 프로세스 중에서 python이 들어간 행을 찾아낸다. 복잡한 프로세스 과정 중에서 현재 실행 중인 프로세스를 조회하는 ps 명령을 파이프 | 와 함께 사용하면 특정 이름을 가진 프로세스를 찾아낼 수 있게 된다.
**
옵션 #
**(추가)
자주 쓰는 명령어 옵션에는 -i, -c, -w를 사용한다 아래의 설명에서처럼 -i는 소문자와 대문자를 구분하지 않고 검색하는 옵션이다. 대부분의 단어에서 정확하게 매치하기 보다는 그 단어가 대소문자를 구별하지 않고 검색한다. -c 는 매치하는 행 수만을 리턴하여 갯수를 확인할 때 사용하면 유용하다. -w 는 단어 단위로 패턴을 검색하는 방법이다. 예를 들어, grep if 라고 한다면 diff, nknife 와 같이 매치하여 찾아내지만 -w 옵션을 사용한다면 (grep -w if ) 이는 매칭시키지 않는다. -l 옵션은 매치하는 내용이 있는 파일의 이름만 리턴한다. 보통은 그 내용이 같이 출력되지만 파일이 여러개 이면 사용자가 눈으로 파일 이름을 확인해야 하기 때문에 그 파일의 이름만을 가지고도 활용할 수 있다. -v옵션은 보통 상세한 설명을 나타내지만 grep 의 경우에는 매치한 내용이 없는 행만 출력한다.
$ ps -aux | grep python | grep -v grep
이 경우에는 grep 을 포함한 명령어가 자체로 출력되기 때문에 명령어 자체도 출력이 되어야 하나 찾는 명령어는 제외할 수가 있다.
**
- -c : 패턴이 일치하는 행의 수를 출력한다.
- -i : 비교시 대소문자를 구별하지 않는다.
- -v : 지정한 패턴과 일치하지 않는 행만 출력한다.
- -n : 행의 번호를 함께 출력한다.
- -l : 패턴이 포함된 파일의 이름을 출력한다.
- -w : 패턴이 전체 단어와 일치하는 행만 출력한다.
팁 #
grep은 다른 명령어들과 함께 “필터”로 자주 사용된다. grep은 사용자가 명령의 수행결과에서 불필요한 정보를 제거할 수 있도록 한다. grep을 필터로 사용하려면, 반드시 명령의 수행 결과를 grep을 통해 파이프(|) 해야 한다.
또한 grep 탐색 패턴에 자주 사용되는 정규 표현식 방식을 아래와 같다.
**(추가)
생물학 데이터를 다루다 보면 특히 서열 데이터 fasta와 genbank같은 파일에서 서열의 갯수를 확인할 때가 있다. 이때 grep을 사용하면 대단히 유용하다. fasta파일은 크게 두 부분으로 나누어지는데 헤드와 서열부분이다. 따라서 헤드 부분의 갯수를 사용하면 된다. 헤드는
>seq_name 설명
과 같이 구성되어 있다. 따라서 ">"의 갯수를 확인하면 된다.
$ grep ">" seq_file
이라고 한다면 fasta 서열 파일에 들어 있는 서열의 갯수를 찾아낼수 있다. 물론 설명란에 ">"에 포함되어 있다고 하더라고 한행씩 출력하기 때문에 정확한 갯수를 찾아낼 수 있다
genbank 포맷의 경우에는 헤드, 저널, 계통, 구조, 서열로 나타나고 이것이 끝나면 "//"로 한 블럭을 끝내게 된다. 즉 서열 하나의 단위로 ACCESSION ... "//" 로 구성되어 있다. 그러므로 "//"를 사용하면 genbank안에 있는 서열의 갯수를 찾아낼 수 있다.
$ grep "//" seq_file
각 서열의 파일이 어떤 구성(genbank, swiss-prot, fasta)로 되어 있는지는 head 명령어로 확인을 하면 된다. 전체 파일을 열어보지 않고 head -100 "seq file"과 같이 특정 부위만 확인하고 서열의 포맷을 확인한 다음 위의 명령어로 서열의 갯수를 구하면 된다.
** $ grep 'apple$' 파일명 = $는 파일의 끝을 나타냄. 파일에서 e로 끝나는 행을 찾는다.
$ grep 'app*' 파일명
= 파일에서 app로 시작하는 모든 단어를 찾는다.
$ grep 'a.....e' 파일명
= 파일에서 a로 시작하고 e로 끝나는 7자리 단어를 찾는다.
$ grep [a-d] 파일명
= 파일에서 a,b,c,d로 시작하는 단어를 모두 찾는다.
$ grep [aA]pple 파일명
= 파일에서 apple 또는 Apple로 시작하는 단어를 모두 찾는다.
$ grep 'apple' d*
= d로 시작하는 모든 파일에서 apple을 포함하는 모든 행을 찾는다.
$ grep 'apple' 파일명1 파일명2
= 지정된 두개의 파일에서 apple을 포함하는 모든 행을 찾는다.
$ grep '^[ab]' 파일명
= 파일에서 a나 b로 시작되는 모든 행을 찾는다.
GAWK(1) Utility Commands GAWK(1)
NAME
gawk - pattern scanning and processing language
SYNOPSIS
gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...
DESCRIPTION
Gawk is the GNU Project’s implementation of the AWK programming lan-
guage. It conforms to the definition of the language in the POSIX
1003.1 Standard. This version in turn is based on the description in
The AWK Programming Language, by Aho, Kernighan, and Weinberger, with
the additional features found in the System V Release 4 version of UNIX
awk. Gawk also provides more recent Bell Laboratories awk extensions,
and a number of GNU-specific extensions.
Pgawk is the profiling version of gawk. It is identical in every way
to gawk, except that programs run more slowly, and it automatically
produces an execution profile in the file awkprof.out when done. See
the --profile option, below.
The command line consists of options to gawk itself, the AWK program
text (if not supplied via the -f or --file options), and values to be
made available in the ARGC and ARGV pre-defined AWK variables.
OPTION FORMAT
Gawk options may be either traditional POSIX one letter options, or
GNU-style long options. POSIX options start with a single “-”, while
long options start with “--”. Long options are provided for both GNU-
specific features and for POSIX-mandated features.
Following the POSIX standard, gawk-specific options are supplied via
arguments to the -W option. Multiple -W options may be supplied Each
-W option has a corresponding long option, as detailed below. Argu-
ments to long options are either joined with the option by an = sign,
with no intervening spaces, or they may be provided in the next command
line argument. Long options may be abbreviated, as long as the abbre-
viation remains unique.
OPTIONS
Gawk accepts the following options, listed by frequency.
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede-
fined variable).
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of
the program begins. Such variable values are available to the
BEGIN block of an AWK program.
-f program-file
--file program-file
Read the AWK program source from the file program-file, instead
of from the first command line argument. Multiple -f (or
--file) options may be used.
-mf NNN
-mr NNN
Set various memory limits to the value NNN. The f flag sets the
maximum number of fields, and the r flag sets the maximum record
size. These two flags and the -m option are from an earlier
version of the Bell Laboratories research version of UNIX awk.
They are ignored by gawk, since gawk has no pre-defined limits.
(Current versions of the Bell Laboratories awk no longer accept
them.)
-O
--optimize
Enable optimizations upon the internal representation of the
program. Currently, this includes just simple constant-folding.
The gawk maintainer hopes to add additional optimizations over
time.
-W compat
-W traditional
--compat
--traditional
Run in compatibility mode. In compatibility mode, gawk behaves
identically to UNIX awk; none of the GNU-specific extensions are
recognized. The use of --traditional is preferred over the
other forms of this option. See GNU EXTENSIONS, below, for more
information.
-W copyleft
-W copyright
--copyleft
--copyright
Print the short version of the GNU copyright information message
on the standard output and exit successfully.
-W dump-variables[=file]
--dump-variables[=file]
Print a sorted list of global variables, their types and final
values to file. If no file is provided, gawk uses a file named
awkvars.out in the current directory.
Having a list of all the global variables is a good way to look
for typographical errors in your programs. You would also use
this option if you have a large program with a lot of functions,
and you want to be sure that your functions don’t inadvertently
use global variables that you meant to be local. (This is a
particularly easy mistake to make with simple variable names
like i, j, and so on.)
-W exec file
--exec file
Similar to -f, however, this is option is the last one pro-
cessed. This should be used with #! scripts, particularly for
CGI applications, to avoid passing in options or source code (!)
on the command line from a URL. This option disables command-
line variable assignments.
-W gen-po
--gen-po
Scan and parse the AWK program, and generate a GNU .po format
file on standard output with entries for all localizable strings
in the program. The program itself is not executed. See the
GNU gettext distribution for more information on .po files.
-W help
-W usage
--help
--usage
Print a relatively short summary of the available options on the
standard output. (Per the GNU Coding Standards, these options
cause an immediate, successful exit.)
-W lint[=value]
--lint[=value]
Provide warnings about constructs that are dubious or non-
portable to other AWK implementations. With an optional argu-
ment of fatal, lint warnings become fatal errors. This may be
drastic, but its use will certainly encourage the development of
cleaner AWK programs. With an optional argument of invalid,
only warnings about things that are actually invalid are issued.
(This is not fully implemented yet.)
-W lint-old
--lint-old
Provide warnings about constructs that are not portable to the
original version of Unix awk.
-W non-decimal-data
--non-decimal-data
Recognize octal and hexadecimal values in input data. Use this
option with great caution!
-W posix
--posix
This turns on compatibility mode, with the following additional
restrictions:
· \x escape sequences are not recognized.
· Only space and tab act as field separators when FS is set to a
single space, newline does not.
· You cannot continue lines after ? and :.
· The synonym func for the keyword function is not recognized.
· The operators ** and **= cannot be used in place of ^ and ^=.
· The fflush() function is not available.
-W profile[=prof_file]
--profile[=prof_file]
Send profiling data to prof_file. The default is awkprof.out.
When run with gawk, the profile is just a “pretty printed” ver-
sion of the program. When run with pgawk, the profile contains
execution counts of each statement in the program in the left
margin and function call counts for each user-defined function.
-W re-interval
--re-interval
Enable the use of interval expressions in regular expression
matching (see Regular Expressions, below). Interval expressions
were not traditionally available in the AWK language. The POSIX
standard added them, to make awk and egrep consistent with each
other. However, their use is likely to break old AWK programs,
so gawk only provides them if they are requested with this
option, or when --posix is specified.
-W source program-text
--source program-text
Use program-text as AWK program source code. This option allows
the easy intermixing of library functions (used via the -f and
--file options) with source code entered on the command line.
It is intended primarily for medium to large AWK programs used
in shell scripts.
-W use-lc-numeric
--use-lc-numeric
This forces gawk to use the locale’s decimal point character
when parsing input data. Although the POSIX standard requires
this behavior, and gawk does so when --posix is in effect, the
default is to follow traditional behavior and use a period as
the decimal point, even in locales where the period is not the
decimal point character. This option overrides the default
behavior, without the full draconian strictness of the --posix
option.
-W version
--version
Print version information for this particular copy of gawk on
the standard output. This is useful mainly for knowing if the
current copy of gawk on your system is up to date with respect
to whatever the Free Software Foundation is distributing. This
is also useful when reporting bugs. (Per the GNU Coding Stan-
dards, these options cause an immediate, successful exit.)
-- Signal the end of options. This is useful to allow further argu-
ments to the AWK program itself to start with a “-”. This pro-
vides consistency with the argument parsing convention used by
most other POSIX programs.
In compatibility mode, any other options are flagged as invalid, but
are otherwise ignored. In normal operation, as long as program text
has been supplied, unknown options are passed on to the AWK program in
the ARGV array for processing. This is particularly useful for running
AWK programs via the “#!” executable interpreter mechanism.
AWK PROGRAM EXECUTION
An AWK program consists of a sequence of pattern-action statements and
optional function definitions.
pattern { action statements }
function name(parameter list) { statements }
Gawk first reads the program source from the program-file(s) if speci-
fied, from arguments to --source, or from the first non-option argument
on the command line. The -f and --source options may be used multiple
times on the command line. Gawk reads the program text as if all the
program-files and command line source texts had been concatenated
together. This is useful for building libraries of AWK functions,
without having to include them in each new AWK program that uses them.
It also provides the ability to mix library functions with command line
programs.
The environment variable AWKPATH specifies a search path to use when
finding source files named with the -f option. If this variable does
not exist, the default path is ".:/usr/local/share/awk". (The actual
directory may vary, depending upon how gawk was built and installed.)
If a file name given to the -f option contains a “/” character, no path
search is performed.
Gawk executes AWK programs in the following order. First, all variable
assignments specified via the -v option are performed. Next, gawk com-
piles the program into an internal form. Then, gawk executes the code
in the BEGIN block(s) (if any), and then proceeds to read each file
named in the ARGV array. If there are no files named on the command
line, gawk reads the standard input.
If a filename on the command line has the form var=val it is treated as
a variable assignment. The variable var will be assigned the value
val. (This happens after any BEGIN block(s) have been run.) Command
line variable assignment is most useful for dynamically assigning val-
ues to the variables AWK uses to control how input is broken into
fields and records. It is also useful for controlling state if multi-
ple passes are needed over a single data file.
If the value of a particular element of ARGV is empty (""), gawk skips
over it.
For each record in the input, gawk tests to see if it matches any
pattern in the AWK program. For each pattern that the record matches,
the associated action is executed. The patterns are tested in the
order they occur in the program.
Finally, after all the input is exhausted, gawk executes the code in
the END block(s) (if any).
VARIABLES, RECORDS AND FIELDS
AWK variables are dynamic; they come into existence when they are first
used. Their values are either floating-point numbers or strings, or
both, depending upon how they are used. AWK also has one dimensional
arrays; arrays with multiple dimensions may be simulated. Several pre-
defined variables are set as a program runs; these are described as
needed and summarized below.
Records
Normally, records are separated by newline characters. You can control
how records are separated by assigning values to the built-in variable
RS. If RS is any single character, that character separates records.
Otherwise, RS is a regular expression. Text in the input that matches
this regular expression separates the record. However, in compatibil-
ity mode, only the first character of its string value is used for sep-
arating records. If RS is set to the null string, then records are
separated by blank lines. When RS is set to the null string, the new-
line character always acts as a field separator, in addition to what-
ever value FS may have.
Fields
As each input record is read, gawk splits the record into fields, using
the value of the FS variable as the field separator. If FS is a single
character, fields are separated by that character. If FS is the null
string, then each individual character becomes a separate field. Oth-
erwise, FS is expected to be a full regular expression. In the special
case that FS is a single space, fields are separated by runs of spaces
and/or tabs and/or newlines. (But see the section POSIX COMPATIBILITY,
below). NOTE: The value of IGNORECASE (see below) also affects how
fields are split when FS is a regular expression, and how records are
separated when RS is a regular expression.
If the FIELDWIDTHS variable is set to a space separated list of num-
bers, each field is expected to have fixed width, and gawk splits up
the record using the specified widths. The value of FS is ignored.
Assigning a new value to FS overrides the use of FIELDWIDTHS, and
restores the default behavior.
Each field in the input record may be referenced by its position, $1,
$2, and so on. $0 is the whole record. Fields need not be referenced
by constants:
n = 5
print $n
prints the fifth field in the input record.
The variable NF is set to the total number of fields in the input
record.
References to non-existent fields (i.e. fields after $NF) produce the
null-string. However, assigning to a non-existent field (e.g., $(NF+2)
= 5) increases the value of NF, creates any intervening fields with the
null string as their value, and causes the value of $0 to be recom-
puted, with the fields being separated by the value of OFS. References
to negative numbered fields cause a fatal error. Decrementing NF
causes the values of fields past the new value to be lost, and the
value of $0 to be recomputed, with the fields being separated by the
value of OFS.
Assigning a value to an existing field causes the whole record to be
rebuilt when $0 is referenced. Similarly, assigning a value to $0
causes the record to be resplit, creating new values for the fields.
Built-in Variables
Gawk’s built-in variables are:
ARGC The number of command line arguments (does not include
options to gawk, or the program source).
ARGIND The index in ARGV of the current file being processed.
ARGV Array of command line arguments. The array is indexed from
0 to ARGC - 1. Dynamically changing the contents of ARGV
can control the files used for data.
BINMODE On non-POSIX systems, specifies use of “binary” mode for
all file I/O. Numeric values of 1, 2, or 3, specify that
input files, output files, or all files, respectively,
should use binary I/O. String values of "r", or "w" spec-
ify that input files, or output files, respectively, should
use binary I/O. String values of "rw" or "wr" specify that
all files should use binary I/O. Any other string value is
treated as "rw", but generates a warning message.
CONVFMT The conversion format for numbers, "%.6g", by default.
ENVIRON An array containing the values of the current environment.
The array is indexed by the environment variables, each
element being the value of that variable (e.g., ENVI-
RON["HOME"] might be /home/arnold). Changing this array
does not affect the environment seen by programs which gawk
spawns via redirection or the system() function.
ERRNO If a system error occurs either doing a redirection for
getline, during a read for getline, or during a close(),
then ERRNO will contain a string describing the error. The
value is subject to translation in non-English locales.
FIELDWIDTHS A white-space separated list of fieldwidths. When set,
gawk parses the input into fields of fixed width, instead
of using the value of the FS variable as the field separa-
tor.
FILENAME The name of the current input file. If no files are speci-
fied on the command line, the value of FILENAME is “-”.
However, FILENAME is undefined inside the BEGIN block
(unless set by getline).
FNR The input record number in the current input file.
FS The input field separator, a space by default. See Fields,
above.
IGNORECASE Controls the case-sensitivity of all regular expression and
string operations. If IGNORECASE has a non-zero value,
then string comparisons and pattern matching in rules,
field splitting with FS, record separating with RS, regular
expression matching with ~ and !~, and the gensub(),
gsub(), index(), match(), split(), and sub() built-in func-
tions all ignore case when doing regular expression opera-
tions. NOTE: Array subscripting is not affected. However,
the asort() and asorti() functions are affected.
Thus, if IGNORECASE is not equal to zero, /aB/ matches all
of the strings "ab", "aB", "Ab", and "AB". As with all AWK
variables, the initial value of IGNORECASE is zero, so all
regular expression and string operations are normally case-
sensitive. Under Unix, the full ISO 8859-1 Latin-1 charac-
ter set is used when ignoring case. As of gawk 3.1.4, the
case equivalencies are fully locale-aware, based on the C
<ctype.h> facilities such as isalpha(), and toupper().
LINT Provides dynamic control of the --lint option from within
an AWK program. When true, gawk prints lint warnings. When
false, it does not. When assigned the string value
"fatal", lint warnings become fatal errors, exactly like
--lint=fatal. Any other true value just prints warnings.
NF The number of fields in the current input record.
NR The total number of input records seen so far.
OFMT The output format for numbers, "%.6g", by default.
OFS The output field separator, a space by default.
ORS The output record separator, by default a newline.
PROCINFO The elements of this array provide access to information
about the running AWK program. On some systems, there may
be elements in the array, "group1" through "groupn" for
some n, which is the number of supplementary groups that
the process has. Use the in operator to test for these
elements. The following elements are guaranteed to be
available:
PROCINFO["egid"] the value of the getegid(2) system call.
PROCINFO["euid"] the value of the geteuid(2) system call.
PROCINFO["FS"] "FS" if field splitting with FS is in
effect, or "FIELDWIDTHS" if field split-
ting with FIELDWIDTHS is in effect.
PROCINFO["gid"] the value of the getgid(2) system call.
PROCINFO["pgrpid"] the process group ID of the current pro-
cess.
PROCINFO["pid"] the process ID of the current process.
PROCINFO["ppid"] the parent process ID of the current
process.
PROCINFO["uid"] the value of the getuid(2) system call.
PROCINFO["version"]
The version of gawk. This is available
from version 3.1.4 and later.
RS The input record separator, by default a newline.
RT The record terminator. Gawk sets RT to the input text that
matched the character or regular expression specified by
RS.
RSTART The index of the first character matched by match(); 0 if
no match. (This implies that character indices start at
one.)
RLENGTH The length of the string matched by match(); -1 if no
match.
SUBSEP The character used to separate multiple subscripts in array
elements, by default "\034".
TEXTDOMAIN The text domain of the AWK program; used to find the local-
ized translations for the program’s strings.
Arrays
Arrays are subscripted with an expression between square brackets ([
and ]). If the expression is an expression list (expr, expr ...) then
the array subscript is a string consisting of the concatenation of the
(string) value of each expression, separated by the value of the SUBSEP
variable. This facility is used to simulate multiply dimensioned
arrays. For example:
i = "A"; j = "B"; k = "C"
x[i, j, k] = "hello, world\n"
assigns the string "hello, world\n" to the element of the array x which
is indexed by the string "A\034B\034C". All arrays in AWK are associa-
tive, i.e. indexed by string values.
The special operator in may be used to test if an array has an index
consisting of a particular value.
if (val in array)
print array[val]
If the array has multiple subscripts, use (i, j) in array.
The in construct may also be used in a for loop to iterate over all the
elements of an array.
An element may be deleted from an array using the delete statement.
The delete statement may also be used to delete the entire contents of
an array, just by specifying the array name without a subscript.
Variable Typing And Conversion
Variables and fields may be (floating point) numbers, or strings, or
both. How the value of a variable is interpreted depends upon its con-
text. If used in a numeric expression, it will be treated as a number;
if used as a string it will be treated as a string.
To force a variable to be treated as a number, add 0 to it; to force it
to be treated as a string, concatenate it with the null string.
When a string must be converted to a number, the conversion is accom-
plished using strtod(3). A number is converted to a string by using
the value of CONVFMT as a format string for sprintf(3), with the
numeric value of the variable as the argument. However, even though
all numbers in AWK are floating-point, integral values are always con-
verted as integers. Thus, given
CONVFMT = "%2.2f"
a = 12
b = a ""
the variable b has a string value of "12" and not "12.00".
When operating in POSIX mode (such as with the --posix command line
option), beware that locale settings may interfere with the way decimal
numbers are treated: the decimal separator of the numbers you are feed-
ing to gawk must conform to what your locale would expect, be it a
comma (,) or a period (.).
Gawk performs comparisons as follows: If two variables are numeric,
they are compared numerically. If one value is numeric and the other
has a string value that is a “numeric string,” then comparisons are
also done numerically. Otherwise, the numeric value is converted to a
string and a string comparison is performed. Two strings are compared,
of course, as strings.
Note that string constants, such as "57", are not numeric strings, they
are string constants. The idea of “numeric string” only applies to
fields, getline input, FILENAME, ARGV elements, ENVIRON elements and
the elements of an array created by split() that are numeric strings.
The basic idea is that user input, and only user input, that looks
numeric, should be treated that way.
Uninitialized variables have the numeric value 0 and the string value
"" (the null, or empty, string).
Octal and Hexadecimal Constants
Starting with version 3.1 of gawk , you may use C-style octal and hex-
adecimal constants in your AWK program source code. For example, the
octal value 011 is equal to decimal 9, and the hexadecimal value 0x11
is equal to decimal 17.
String Constants
String constants in AWK are sequences of characters enclosed between
double quotes ("). Within strings, certain escape sequences are recog-
nized, as in C. These are:
\\ A literal backslash.
\a The “alert” character; usually the ASCII BEL character.
\b backspace.
\f form-feed.
\n newline.
\r carriage return.
\t horizontal tab.
\v vertical tab.
\xhex digits
The character represented by the string of hexadecimal digits fol-
lowing the \x. As in ANSI C, all following hexadecimal digits are
considered part of the escape sequence. (This feature should tell
us something about language design by committee.) E.g., "\x1B" is
the ASCII ESC (escape) character.
\ddd The character represented by the 1-, 2-, or 3-digit sequence of
octal digits. E.g., "\033" is the ASCII ESC (escape) character.
\c The literal character c.
The escape sequences may also be used inside constant regular expres-
sions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).
In compatibility mode, the characters represented by octal and hexadec-
imal escape sequences are treated literally when used in regular
expression constants. Thus, /a\52b/ is equivalent to /a\*b/.
PATTERNS AND ACTIONS
AWK is a line-oriented language. The pattern comes first, and then the
action. Action statements are enclosed in { and }. Either the pattern
may be missing, or the action may be missing, but, of course, not both.
If the pattern is missing, the action is executed for every single
record of input. A missing action is equivalent to
{ print }
which prints the entire record.
Comments begin with the “#” character, and continue until the end of
the line. Blank lines may be used to separate statements. Normally, a
statement ends with a newline, however, this is not the case for lines
ending in a “,”, {, ?, :, &&, or ||. Lines ending in do or else also
have their statements automatically continued on the following line.
In other cases, a line can be continued by ending it with a “\”, in
which case the newline will be ignored.
Multiple statements may be put on one line by separating them with a
“;”. This applies to both the statements within the action part of a
pattern-action pair (the usual case), and to the pattern-action state-
ments themselves.
Patterns
AWK patterns may be one of the following:
BEGIN
END
/regular expression/
relational expression
pattern && pattern
pattern || pattern
pattern ? pattern : pattern
(pattern)
! pattern
pattern1, pattern2
BEGIN and END are two special kinds of patterns which are not tested
against the input. The action parts of all BEGIN patterns are merged
as if all the statements had been written in a single BEGIN block.
They are executed before any of the input is read. Similarly, all the
END blocks are merged, and executed when all the input is exhausted (or
when an exit statement is executed). BEGIN and END patterns cannot be
combined with other patterns in pattern expressions. BEGIN and END
patterns cannot have missing action parts.
For /regular expression/ patterns, the associated statement is executed
for each input record that matches the regular expression. Regular
expressions are the same as those in egrep(1), and are summarized
below.
A relational expression may use any of the operators defined below in
the section on actions. These generally test whether certain fields
match certain regular expressions.
The &&, ||, and ! operators are logical AND, logical OR, and logical
NOT, respectively, as in C. They do short-circuit evaluation, also as
in C, and are used for combining more primitive pattern expressions.
As in most languages, parentheses may be used to change the order of
evaluation.
The ?: operator is like the same operator in C. If the first pattern
is true then the pattern used for testing is the second pattern, other-
wise it is the third. Only one of the second and third patterns is
evaluated.
The pattern1, pattern2 form of an expression is called a range pattern.
It matches all input records starting with a record that matches pat-
tern1, and continuing until a record that matches pattern2, inclusive.
It does not combine with any other sort of pattern expression.
Regular Expressions
Regular expressions are the extended kind found in egrep. They are
composed of characters as follows:
c matches the non-metacharacter c.
\c matches the literal character c.
. matches any character including newline.
^ matches the beginning of a string.
$ matches the end of a string.
[abc...] character list, matches any of the characters abc....
[^abc...] negated character list, matches any character except abc....
r1|r2 alternation: matches either r1 or r2.
r1r2 concatenation: matches r1, and then r2.
r+ matches one or more r’s.
r* matches zero or more r’s.
r? matches zero or one r’s.
(r) grouping: matches r.
r{n}
r{n,}
r{n,m} One or two numbers inside braces denote an interval expres-
sion. If there is one number in the braces, the preceding
regular expression r is repeated n times. If there are two
numbers separated by a comma, r is repeated n to m times.
If there is one number followed by a comma, then r is
repeated at least n times.
Interval expressions are only available if either --posix or
--re-interval is specified on the command line.
\y matches the empty string at either the beginning or the end
of a word.
\B matches the empty string within a word.
\< matches the empty string at the beginning of a word.
\> matches the empty string at the end of a word.
\w matches any word-constituent character (letter, digit, or
underscore).
\W matches any character that is not word-constituent.
\‘ matches the empty string at the beginning of a buffer
(string).
\’ matches the empty string at the end of a buffer.
The escape sequences that are valid in string constants (see below) are
also valid in regular expressions.
Character classes are a feature introduced in the POSIX standard. A
character class is a special notation for describing lists of charac-
ters that have a specific attribute, but where the actual characters
themselves can vary from country to country and/or from character set
to character set. For example, the notion of what is an alphabetic
character differs in the USA and in France.
A character class is only valid in a regular expression inside the
brackets of a character list. Character classes consist of [:, a key-
word denoting the class, and :]. The character classes defined by the
POSIX standard are:
[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space or tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is
printable, but not visible, while an a is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not control char-
acters.)
[:punct:] Punctuation characters (characters that are not letter, dig-
its, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed, to name
a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.
For example, before the POSIX standard, to match alphanumeric charac-
ters, you would have had to write /[A-Za-z0-9]/. If your character set
had other alphabetic characters in it, this would not match them, and
if your character set collated differently from ASCII, this might not
even match the ASCII alphanumeric characters. With the POSIX character
classes, you can write /[[:alnum:]]/, and this matches the alphabetic
and numeric characters in your character set, no matter what it is.
Two additional special sequences can appear in character lists. These
apply to non-ASCII character sets, which can have single symbols
(called collating elements) that are represented with more than one
character, as well as several characters that are equivalent for col-
lating, or sorting, purposes. (E.g., in French, a plain “e” and a
grave-accented “`” are equivalent.)
Collating Symbols
A collating symbol is a multi-character collating element
enclosed in [. and .]. For example, if ch is a collating ele-
ment, then [[.ch.]] is a regular expression that matches this
collating element, while [ch] is a regular expression that
matches either c or h.
Equivalence Classes
An equivalence class is a locale-specific name for a list of
characters that are equivalent. The name is enclosed in [= and
=]. For example, the name e might be used to represent all of
“e,” “´,” and “`.” In this case, [[=e=]] is a regular expres-
sion that matches any of e, ´, or `.
These features are very valuable in non-English speaking locales. The
library functions that gawk uses for regular expression matching cur-
rently only recognize POSIX character classes; they do not recognize
collating symbols or equivalence classes.
The \y, \B, \<, \>, \w, \W, \‘, and \’ operators are specific to gawk;
they are extensions based on facilities in the GNU regular expression
libraries.
The various command line options control how gawk interprets characters
in regular expressions.
No options
In the default case, gawk provide all the facilities of POSIX
regular expressions and the GNU regular expression operators
described above. However, interval expressions are not sup-
ported.
--posix
Only POSIX regular expressions are supported, the GNU operators
are not special. (E.g., \w matches a literal w). Interval
expressions are allowed. PLEASE NOTE that the regular expression
[A-Z] will also match the lowercase characters in this case!
Consult the info pages of gawk utility for more information
about this behaviour.
--traditional
Traditional Unix awk regular expressions are matched. The GNU
operators are not special, interval expressions are not avail-
able, and neither are the POSIX character classes ([[:alnum:]]
and so on). Characters described by octal and hexadecimal
escape sequences are treated literally, even if they represent
regular expression metacharacters.
--re-interval
Allow interval expressions in regular expressions, even if
--traditional has been provided.
Actions
Action statements are enclosed in braces, { and }. Action statements
consist of the usual assignment, conditional, and looping statements
found in most languages. The operators, control statements, and
input/output statements available are patterned after those in C.
Operators
The operators in AWK, in order of decreasing precedence, are
(...) Grouping
$ Field reference.
++ -- Increment and decrement, both prefix and postfix.
^ Exponentiation (** may also be used, and **= for the
assignment operator).
+ - ! Unary plus, unary minus, and logical negation.
* / % Multiplication, division, and modulus.
+ - Addition and subtraction.
space String concatenation.
| |& Piped I/O for getline, print, and printf.
< >
<= >=
!= == The regular relational operators.
~ !~ Regular expression match, negated match. NOTE: Do not use
a constant regular expression (/foo/) on the left-hand side
of a ~ or !~. Only use one on the right-hand side. The
expression /foo/ ~ exp has the same meaning as (($0 ~
/foo/) ~ exp). This is usually not what was intended.
in Array membership.
&& Logical AND.
|| Logical OR.
?: The C conditional expression. This has the form expr1 ?
expr2 : expr3. If expr1 is true, the value of the expres-
sion is expr2, otherwise it is expr3. Only one of expr2
and expr3 is evaluated.
= += -=
*= /= %= ^= Assignment. Both absolute assignment (var = value) and
operator-assignment (the other forms) are supported.
Control Statements
The control statements are as follows:
if (condition) statement [ else statement ]
while (condition) statement
do statement while (condition)
for (expr1; expr2; expr3) statement
for (var in array) statement
break
continue
delete array[index]
delete array
exit [ expression ]
{ statements }
I/O Statements
The input/output statements are as follows:
close(file [, how]) Close file, pipe or co-process. The optional how
should only be used when closing one end of a
two-way pipe to a co-process. It must be a
string value, either "to" or "from".
getline Set $0 from next input record; set NF, NR, FNR.
getline <file Set $0 from next record of file; set NF.
getline var Set var from next input record; set NR, FNR.
getline var <file Set var from next record of file.
command | getline [var]
Run command piping the output either into $0 or
var, as above.
command |& getline [var]
Run command as a co-process piping the output
either into $0 or var, as above. Co-processes
are a gawk extension. (command can also be a
socket. See the subsection Special File Names,
below.)
next Stop processing the current input record. The
next input record is read and processing starts
over with the first pattern in the AWK program.
If the end of the input data is reached, the END
block(s), if any, are executed.
nextfile Stop processing the current input file. The next
input record read comes from the next input file.
FILENAME and ARGIND are updated, FNR is reset to
1, and processing starts over with the first pat-
tern in the AWK program. If the end of the input
data is reached, the END block(s), if any, are
executed.
print Prints the current record. The output record is
terminated with the value of the ORS variable.
print expr-list Prints expressions. Each expression is separated
by the value of the OFS variable. The output
record is terminated with the value of the ORS
variable.
print expr-list >file Prints expressions on file. Each expression is
separated by the value of the OFS variable. The
output record is terminated with the value of the
ORS variable.
printf fmt, expr-list Format and print.
printf fmt, expr-list >file
Format and print on file.
system(cmd-line) Execute the command cmd-line, and return the exit
status. (This may not be available on non-POSIX
systems.)
fflush([file]) Flush any buffers associated with the open output
file or pipe file. If file is missing, then
standard output is flushed. If file is the null
string, then all open output files and pipes have
their buffers flushed.
Additional output redirections are allowed for print and printf.
print ... >> file
Appends output to the file.
print ... | command
Writes on a pipe.
print ... |& command
Sends data to a co-process or socket. (See also the subsection
Special File Names, below.)
The getline command returns 1 on success, 0 on end of file, and -1 on
an error. Upon an error, ERRNO contains a string describing the prob-
lem.
NOTE: If using a pipe, co-process, or socket to getline, or from print
or printf within a loop, you must use close() to create new instances
of the command or socket. AWK does not automatically close pipes,
sockets, or co-processes when they return EOF.
The printf Statement
The AWK versions of the printf statement and sprintf() function (see
below) accept the following conversion specification formats:
%c An ASCII character. If the argument used for %c is numeric, it
is treated as a character and printed. Otherwise, the argument
is assumed to be a string, and the only first character of that
string is printed.
%d, %i A decimal number (the integer part).
%e, %E A floating point number of the form [-]d.dddddde[+-]dd. The %E
format uses E instead of e.
%f, %F A floating point number of the form [-]ddd.dddddd. If the sys-
tem library supports it, %F is available as well. This is like
%f, but uses capital letters for special “not a number” and
“infinity” values. If %F is not available, gawk uses %f.
%g, %G Use %e or %f conversion, whichever is shorter, with nonsignifi-
cant zeros suppressed. The %G format uses %E instead of %e.
%o An unsigned octal number (also an integer).
%u An unsigned decimal number (again, an integer).
%s A character string.
%x, %X An unsigned hexadecimal number (an integer). The %X format
uses ABCDEF instead of abcdef.
%% A single % character; no argument is converted.
NOTE: When using the integer format-control letters for values that are
outside the range of a C long integer, gawk switches to the %0f format
specifier. If --lint is provided on the command line gawk warns about
this. Other versions of awk may print invalid values or do something
else entirely.
Optional, additional parameters may lie between the % and the control
letter:
count$ Use the count’th argument at this point in the formatting. This
is called a positional specifier and is intended primarily for
use in translated versions of format strings, not in the origi-
nal text of an AWK program. It is a gawk extension.
- The expression should be left-justified within its field.
space For numeric conversions, prefix positive values with a space,
and negative values with a minus sign.
+ The plus sign, used before the width modifier (see below), says
to always supply a sign for numeric conversions, even if the
data to be formatted is positive. The + overrides the space
modifier.
# Use an “alternate form” for certain control letters. For %o,
supply a leading zero. For %x, and %X, supply a leading 0x or
0X for a nonzero result. For %e, %E, %f and %F, the result
always contains a decimal point. For %g, and %G, trailing zeros
are not removed from the result.
0 A leading 0 (zero) acts as a flag, that indicates output should
be padded with zeroes instead of spaces. This applies even to
non-numeric output formats. This flag only has an effect when
the field width is wider than the value to be printed.
width The field should be padded to this width. The field is normally
padded with spaces. If the 0 flag has been used, it is padded
with zeroes.
.prec A number that specifies the precision to use when printing. For
the %e, %E, %f and %F, formats, this specifies the number of
digits you want printed to the right of the decimal point. For
the %g, and %G formats, it specifies the maximum number of sig-
nificant digits. For the %d, %o, %i, %u, %x, and %X formats, it
specifies the minimum number of digits to print. For %s, it
specifies the maximum number of characters from the string that
should be printed.
The dynamic width and prec capabilities of the ANSI C printf() routines
are supported. A * in place of either the width or prec specifications
causes their values to be taken from the argument list to printf or
sprintf(). To use a positional specifier with a dynamic width or pre-
cision, supply the count$ after the * in the format string. For exam-
ple, "%3$*2$.*1$s".
Special File Names
When doing I/O redirection from either print or printf into a file, or
via getline from a file, gawk recognizes certain special filenames
internally. These filenames allow access to open file descriptors
inherited from gawk’s parent process (usually the shell). These file
names may also be used on the command line to name data files. The
filenames are:
/dev/stdin The standard input.
/dev/stdout The standard output.
/dev/stderr The standard error output.
/dev/fd/n The file associated with the open file descriptor n.
These are particularly useful for error messages. For example:
print "You blew it!" > "/dev/stderr"
whereas you would otherwise have to use
print "You blew it!" | "cat 1>&2"
The following special filenames may be used with the |& co-process
operator for creating TCP/IP network connections.
/inet/tcp/lport/rhost/rport File for TCP/IP connection on local port
lport to remote host rhost on remote port
rport. Use a port of 0 to have the system
pick a port.
/inet/udp/lport/rhost/rport Similar, but use UDP/IP instead of TCP/IP.
/inet/raw/lport/rhost/rport Reserved for future use.
Other special filenames provide access to information about the running
gawk process. These filenames are now obsolete. Use the PROCINFO
array to obtain the information they provide. The filenames are:
/dev/pid Reading this file returns the process ID of the current
process, in decimal, terminated with a newline.
/dev/ppid Reading this file returns the parent process ID of the cur-
rent process, in decimal, terminated with a newline.
/dev/pgrpid Reading this file returns the process group ID of the cur-
rent process, in decimal, terminated with a newline.
/dev/user Reading this file returns a single record terminated with a
newline. The fields are separated with spaces. $1 is the
value of the getuid(2) system call, $2 is the value of the
geteuid(2) system call, $3 is the value of the getgid(2)
system call, and $4 is the value of the getegid(2) system
call. If there are any additional fields, they are the
group IDs returned by getgroups(2). Multiple groups may
not be supported on all systems.
Numeric Functions
AWK has the following built-in arithmetic functions:
atan2(y, x) Returns the arctangent of y/x in radians.
cos(expr) Returns the cosine of expr, which is in radians.
exp(expr) The exponential function.
int(expr) Truncates to integer.
log(expr) The natural logarithm function.
rand() Returns a random number N, between 0 and 1, such that 0 ≤
N < 1.
sin(expr) Returns the sine of expr, which is in radians.
sqrt(expr) The square root function.
srand([expr]) Uses expr as a new seed for the random number generator.
If no expr is provided, the time of day is used. The
return value is the previous seed for the random number
generator.
String Functions
Gawk has the following built-in string functions:
asort(s [, d]) Returns the number of elements in the source
array s. The contents of s are sorted using
gawk’s normal rules for comparing values, and
the indices of the sorted values of s are
replaced with sequential integers starting with
1. If the optional destination array d is spec-
ified, then s is first duplicated into d, and
then d is sorted, leaving the indices of the
source array s unchanged.
asorti(s [, d]) Returns the number of elements in the source
array s. The behavior is the same as that of
asort(), except that the array indices are used
for sorting, not the array values. When done,
the array is indexed numerically, and the val-
ues are those of the original indices. The
original values are lost; thus provide a second
array if you wish to preserve the original.
gensub(r, s, h [, t]) Search the target string t for matches of the
regular expression r. If h is a string
beginning with g or G, then replace all matches
of r with s. Otherwise, h is a number indicat-
ing which match of r to replace. If t is not
supplied, $0 is used instead. Within the
replacement text s, the sequence \n, where n is
a digit from 1 to 9, may be used to indicate
just the text that matched the n’th parenthe-
sized subexpression. The sequence \0 repre-
sents the entire matched text, as does the
character &. Unlike sub() and gsub(), the mod-
ified string is returned as the result of the
function, and the original target string is not
changed.
gsub(r, s [, t]) For each substring matching the regular expres-
sion r in the string t, substitute the string
s, and return the number of substitutions. If
t is not supplied, use $0. An & in the
replacement text is replaced with the text that
was actually matched. Use \& to get a literal
&. (This must be typed as "\\&"; see GAWK:
Effective AWK Programming for a fuller discus-
sion of the rules for &’s and backslashes in
the replacement text of sub(), gsub(), and gen-
sub().)
index(s, t) Returns the index of the string t in the string
s, or 0 if t is not present. (This implies
that character indices start at one.)
length([s]) Returns the length of the string s, or the
length of $0 if s is not supplied. Starting
with version 3.1.5, as a non-standard exten-
sion, with an array argument, length() returns
the number of elements in the array.
match(s, r [, a]) Returns the position in s where the regular
expression r occurs, or 0 if r is not present,
and sets the values of RSTART and RLENGTH.
Note that the argument order is the same as for
the ~ operator: str ~ re. If array a is pro-
vided, a is cleared and then elements 1 through
n are filled with the portions of s that match
the corresponding parenthesized subexpression
in r. The 0’th element of a contains the por-
tion of s matched by the entire regular expres-
sion r. Subscripts a[n, "start"], and a[n,
"length"] provide the starting index in the
string and length respectively, of each match-
ing substring.
split(s, a [, r]) Splits the string s into the array a on the
regular expression r, and returns the number of
fields. If r is omitted, FS is used instead.
The array a is cleared first. Splitting
behaves identically to field splitting,
described above.
sprintf(fmt, expr-list) Prints expr-list according to fmt, and returns
the resulting string.
strtonum(str) Examines str, and returns its numeric value.
If str begins with a leading 0, strtonum()
assumes that str is an octal number. If str
begins with a leading 0x or 0X, strtonum()
assumes that str is a hexadecimal number.
sub(r, s [, t]) Just like gsub(), but only the first matching
substring is replaced.
substr(s, i [, n]) Returns the at most n-character substring of s
starting at i. If n is omitted, the rest of s
is used.
tolower(str) Returns a copy of the string str, with all the
upper-case characters in str translated to
their corresponding lower-case counterparts.
Non-alphabetic characters are left unchanged.
toupper(str) Returns a copy of the string str, with all the
lower-case characters in str translated to
their corresponding upper-case counterparts.
Non-alphabetic characters are left unchanged.
As of version 3.1.5, gawk is multibyte aware. This means that index(),
length(), substr() and match() all work in terms of characters, not
bytes.
Time Functions
Since one of the primary uses of AWK programs is processing log files
that contain time stamp information, gawk provides the following func-
tions for obtaining time stamps and formatting them.
mktime(datespec)
Turns datespec into a time stamp of the same form as returned
by systime(). The datespec is a string of the form YYYY MM
DD HH MM SS[ DST]. The contents of the string are six or
seven numbers representing respectively the full year includ-
ing century, the month from 1 to 12, the day of the month
from 1 to 31, the hour of the day from 0 to 23, the minute
from 0 to 59, and the second from 0 to 60, and an optional
daylight saving flag. The values of these numbers need not
be within the ranges specified; for example, an hour of -1
means 1 hour before midnight. The origin-zero Gregorian cal-
endar is assumed, with year 0 preceding year 1 and year -1
preceding year 0. The time is assumed to be in the local
timezone. If the daylight saving flag is positive, the time
is assumed to be daylight saving time; if zero, the time is
assumed to be standard time; and if negative (the default),
mktime() attempts to determine whether daylight saving time
is in effect for the specified time. If datespec does not
contain enough elements or if the resulting time is out of
range, mktime() returns -1.
strftime([format [, timestamp[, utc-flag]]])
Formats timestamp according to the specification in format.
If utc-flag is present and is non-zero or non-null, the
result is in UTC, otherwise the result is in local time. The
timestamp should be of the same form as returned by sys-
time(). If timestamp is missing, the current time of day is
used. If format is missing, a default format equivalent to
the output of date(1) is used. See the specification for the
strftime() function in ANSI C for the format conversions that
are guaranteed to be available.
systime() Returns the current time of day as the number of seconds
since the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).
Bit Manipulations Functions
Starting with version 3.1 of gawk, the following bit manipulation func-
tions are available. They work by converting double-precision floating
point values to uintmax_t integers, doing the operation, and then con-
verting the result back to floating point. The functions are:
and(v1, v2) Return the bitwise AND of the values provided by v1
and v2.
compl(val) Return the bitwise complement of val.
lshift(val, count) Return the value of val, shifted left by count
bits.
or(v1, v2) Return the bitwise OR of the values provided by v1
and v2.
rshift(val, count) Return the value of val, shifted right by count
bits.
xor(v1, v2) Return the bitwise XOR of the values provided by v1
and v2.
Internationalization Functions
Starting with version 3.1 of gawk, the following functions may be used
from within your AWK program for translating strings at run-time. For
full details, see GAWK: Effective AWK Programming.
bindtextdomain(directory [, domain])
Specifies the directory where gawk looks for the .mo files, in
case they will not or cannot be placed in the ‘‘standard’’ loca-
tions (e.g., during testing). It returns the directory where
domain is ‘‘bound.’’
The default domain is the value of TEXTDOMAIN. If directory is
the null string (""), then bindtextdomain() returns the current
binding for the given domain.
dcgettext(string [, domain [, category]])
Returns the translation of string in text domain domain for
locale category category. The default value for domain is the
current value of TEXTDOMAIN. The default value for category is
"LC_MESSAGES".
If you supply a value for category, it must be a string equal to
one of the known locale categories described in GAWK: Effective
AWK Programming. You must also supply a text domain. Use
TEXTDOMAIN if you want to use the current domain.
dcngettext(string1 , string2 , number [, domain [, category]])
Returns the plural form used for number of the translation of
string1 and string2 in text domain domain for locale category
category. The default value for domain is the current value of
TEXTDOMAIN. The default value for category is "LC_MESSAGES".
If you supply a value for category, it must be a string equal to
one of the known locale categories described in GAWK: Effective
AWK Programming. You must also supply a text domain. Use
TEXTDOMAIN if you want to use the current domain.
USER-DEFINED FUNCTIONS
Functions in AWK are defined as follows:
function name(parameter list) { statements }
Functions are executed when they are called from within expressions in
either patterns or actions. Actual parameters supplied in the function
call are used to instantiate the formal parameters declared in the
function. Arrays are passed by reference, other variables are passed
by value.
Since functions were not originally part of the AWK language, the pro-
vision for local variables is rather clumsy: They are declared as extra
parameters in the parameter list. The convention is to separate local
variables from real parameters by extra spaces in the parameter list.
For example:
function f(p, q, a, b) # a and b are local
{
...
}
/abc/ { ... ; f(1, 2) ; ... }
The left parenthesis in a function call is required to immediately fol-
low the function name, without any intervening white space. This
avoids a syntactic ambiguity with the concatenation operator. This
restriction does not apply to the built-in functions listed above.
Functions may call each other and may be recursive. Function parame-
ters used as local variables are initialized to the null string and the
number zero upon function invocation.
Use return expr to return a value from a function. The return value is
undefined if no value is provided, or if the function returns by
“falling off” the end.
If --lint has been provided, gawk warns about calls to undefined func-
tions at parse time, instead of at run time. Calling an undefined
function at run time is a fatal error.
The word func may be used in place of function.
DYNAMICALLY LOADING NEW FUNCTIONS
Beginning with version 3.1 of gawk, you can dynamically add new built-
in functions to the running gawk interpreter. The full details are
beyond the scope of this manual page; see GAWK: Effective AWK Program-
ming for the details.
extension(object, function)
Dynamically link the shared object file named by object, and
invoke function in that object, to perform initialization.
These should both be provided as strings. Returns the value
returned by function.
This function is provided and documented in GAWK: Effective AWK Pro-
gramming, but everything about this feature is likely to change eventu-
ally. We STRONGLY recommend that you do not use this feature for any-
thing that you aren’t willing to redo.
SIGNALS
pgawk accepts two signals. SIGUSR1 causes it to dump a profile and
function call stack to the profile file, which is either awkprof.out,
or whatever file was named with the --profile option. It then contin-
ues to run. SIGHUP causes pgawk to dump the profile and function call
stack and then exit.
EXAMPLES
Print and sort the login names of all users:
BEGIN { FS = ":" }
{ print $1 | "sort" }
Count lines in a file:
{ nlines++ }
END { print nlines }
Precede each line by its number in the file:
{ print FNR, $0 }
Concatenate and line number (a variation on a theme):
{ print NR, $0 }
Run an external command for particular lines of data:
tail -f access_log |
awk ’/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }’
INTERNATIONALIZATION
String constants are sequences of characters enclosed in double quotes.
In non-English speaking environments, it is possible to mark strings in
the AWK program as requiring translation to the native natural lan-
guage. Such strings are marked in the AWK program with a leading under-
score (“_”). For example,
gawk ’BEGIN { print "hello, world" }’
always prints hello, world. But,
gawk ’BEGIN { print _"hello, world" }’
might print bonjour, monde in France.
There are several steps involved in producing and running a localizable
AWK program.
1. Add a BEGIN action to assign a value to the TEXTDOMAIN variable to
set the text domain to a name associated with your program.
BEGIN { TEXTDOMAIN = "myprog" }
This allows gawk to find the .mo file associated with your program.
Without this step, gawk uses the messages text domain, which likely
does not contain translations for your program.
2. Mark all strings that should be translated with leading under-
scores.
3. If necessary, use the dcgettext() and/or bindtextdomain() functions
in your program, as appropriate.
4. Run gawk --gen-po -f myprog.awk > myprog.po to generate a .po file
for your program.
5. Provide appropriate translations, and build and install the corre-
sponding .mo files.
The internationalization features are described in full detail in GAWK:
Effective AWK Programming.
POSIX COMPATIBILITY
A primary goal for gawk is compatibility with the POSIX standard, as
well as with the latest version of UNIX awk. To this end, gawk incor-
porates the following user visible features which are not described in
the AWK book, but are part of the Bell Laboratories version of awk, and
are in the POSIX standard.
The book indicates that command line variable assignment happens when
awk would otherwise open the argument as a file, which is after the
BEGIN block is executed. However, in earlier implementations, when
such an assignment appeared before any file names, the assignment would
happen before the BEGIN block was run. Applications came to depend on
this “feature.” When awk was changed to match its documentation, the
-v option for assigning variables before program execution was added to
accommodate applications that depended upon the old behavior. (This
feature was agreed upon by both the Bell Laboratories and the GNU
developers.)
The -W option for implementation specific features is from the POSIX
standard.
When processing arguments, gawk uses the special option “--” to signal
the end of arguments. In compatibility mode, it warns about but other-
wise ignores undefined options. In normal operation, such arguments
are passed on to the AWK program for it to process.
The AWK book does not define the return value of srand(). The POSIX
standard has it return the seed it was using, to allow keeping track of
random number sequences. Therefore srand() in gawk also returns its
current seed.
Other new features are: The use of multiple -f options (from MKS awk);
the ENVIRON array; the \a, and \v escape sequences (done originally in
gawk and fed back into the Bell Laboratories version); the tolower()
and toupper() built-in functions (from the Bell Laboratories version);
and the ANSI C conversion specifications in printf (done first in the
Bell Laboratories version).
HISTORICAL FEATURES
There are two features of historical AWK implementations that gawk sup-
ports. First, it is possible to call the length() built-in function
not only with no argument, but even without parentheses! Thus,
a = length # Holy Algol 60, Batman!
is the same as either of
a = length()
a = length($0)
This feature is marked as “deprecated” in the POSIX standard, and gawk
issues a warning about its use if --lint is specified on the command
line.
The other feature is the use of either the continue or the break state-
ments outside the body of a while, for, or do loop. Traditional AWK
implementations have treated such usage as equivalent to the next
statement. Gawk supports this usage if --traditional has been speci-
fied.
GNU EXTENSIONS
Gawk has a number of extensions to POSIX awk. They are described in
this section. All the extensions described here can be disabled by
invoking gawk with the --traditional or --posix options.
The following features of gawk are not available in POSIX awk.
· No path search is performed for files named via the -f option.
Therefore the AWKPATH environment variable is not special.
· The \x escape sequence. (Disabled with --posix.)
· The fflush() function. (Disabled with --posix.)
· The ability to continue lines after ? and :. (Disabled with
--posix.)
· Octal and hexadecimal constants in AWK programs.
· The ARGIND, BINMODE, ERRNO, LINT, RT and TEXTDOMAIN variables are not
special.
· The IGNORECASE variable and its side-effects are not available.
· The FIELDWIDTHS variable and fixed-width field splitting.
· The PROCINFO array is not available.
· The use of RS as a regular expression.
· The special file names available for I/O redirection are not recog-
nized.
· The |& operator for creating co-processes.
· The ability to split out individual characters using the null string
as the value of FS, and as the third argument to split().
· The optional second argument to the close() function.
· The optional third argument to the match() function.
· The ability to use positional specifiers with printf and sprintf().
· The ability to pass an array to length().
· The use of delete array to delete the entire contents of an array.
· The use of nextfile to abandon processing of the current input file.
· The and(), asort(), asorti(), bindtextdomain(), compl(), dcgettext(),
dcngettext(), gensub(), lshift(), mktime(), or(), rshift(), strf-
time(), strtonum(), systime() and xor() functions.
· Localizable strings.
· Adding new built-in functions dynamically with the extension() func-
tion.
The AWK book does not define the return value of the close() function.
Gawk’s close() returns the value from fclose(3), or pclose(3), when
closing an output file or pipe, respectively. It returns the process’s
exit status when closing an input pipe. The return value is -1 if the
named file, pipe or co-process was not opened with a redirection.
When gawk is invoked with the --traditional option, if the fs argument
to the -F option is “t”, then FS is set to the tab character. Note
that typing gawk -F\t ... simply causes the shell to quote the “t,”
and does not pass “\t” to the -F option. Since this is a rather ugly
special case, it is not the default behavior. This behavior also does
not occur if --posix has been specified. To really get a tab character
as the field separator, it is best to use single quotes: gawk -F’\t’
....
If gawk is configured with the --enable-switch option to the configure
command, then it accepts an additional control-flow statement:
switch (expression) {
case value|regex : statement
...
[ default: statement ]
}
If gawk is configured with the --disable-directories-fatal option, then
it will silently skip directories named on the command line. Other-
wise, it will do so only if invoked with the --traditional option.
ENVIRONMENT VARIABLES
The AWKPATH environment variable can be used to provide a list of
directories that gawk searches when looking for files named via the -f
and --file options.
If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly
as if --posix had been specified on the command line. If --lint has
been specified, gawk issues a warning message to this effect.
SEE ALSO
egrep(1), getpid(2), getppid(2), getpgrp(2), getuid(2), geteuid(2),
getgid(2), getegid(2), getgroups(2)
The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter
J. Weinberger, Addison-Wesley, 1988. ISBN 0-201-07981-X.
GAWK: Effective AWK Programming, Edition 3.0, published by the Free
Software Foundation, 2001. The current version of this document is
available online at http://www.gnu.org/software/gawk/manual.
BUGS
The -F option is not necessary given the command line variable assign-
ment feature; it remains only for backwards compatibility.
Syntactically invalid single character programs tend to overflow the
parse stack, generating a rather unhelpful message. Such programs are
surprisingly difficult to diagnose in the completely general case, and
the effort to do so really is not worth it.
AUTHORS
The original version of UNIX awk was designed and implemented by Alfred
Aho, Peter Weinberger, and Brian Kernighan of Bell Laboratories. Brian
Kernighan continues to maintain and enhance it.
Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote
gawk, to be compatible with the original version of awk distributed in
Seventh Edition UNIX. John Woods contributed a number of bug fixes.
David Trueman, with contributions from Arnold Robbins, made gawk com-
patible with the new version of UNIX awk. Arnold Robbins is the cur-
rent maintainer.
The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
Scott Deifik is the current DOS maintainer. Pat Rankin did the port to
VMS, and Michal Jaegermann did the port to the Atari ST. The port to
OS/2 was done by Kai Uwe Rommel, with contributions and help from
Darrel Hankerson. Andreas Buening now maintains the OS/2 port. Fred
Fish supplied support for the Amiga, and Martin Brown provided the BeOS
port. Stephen Davies provided the original Tandem port, and Matthew
Woehlke provided changes for Tandem’s POSIX-compliant systems. Ralf
Wildenhues now maintains that port.
See the README file in the gawk distribution for current information
about maintainers and which ports are currently supported.
VERSION INFORMATION
This man page documents gawk, version 3.1.7.
BUG REPORTS
If you find a bug in gawk, please send electronic mail to bug-
gawk@gnu.org. Please include your operating system and its revision,
the version of gawk (from gawk --version), what C compiler you used to
compile it, and a test program and data that are as small as possible
for reproducing the problem.
Before sending a bug report, please do the following things. First,
verify that you have the latest version of gawk. Many bugs (usually
subtle ones) are fixed at each release, and if yours is out of date,
the problem may already have been solved. Second, please see if set-
ting the environment variable LC_ALL to LC_ALL=C causes things to
behave as you expect. If so, it’s a locale issue, and may or may not
really be a bug. Finally, please read this man page and the reference
manual carefully to be sure that what you think is a bug really is,
instead of just a quirk in the language.
Whatever you do, do NOT post a bug report in comp.lang.awk. While the
gawk developers occasionally read this newsgroup, posting bug reports
there is an unreliable way to report bugs. Instead, please use the
electronic mail addresses given above.
If you’re using a GNU/Linux system or BSD-based system, you may wish to
submit a bug report to the vendor of your distribution. That’s fine,
but please send a copy to the official email address as well, since
there’s no guarantee that the bug will be forwarded to the gawk main-
tainer.
ACKNOWLEDGEMENTS
Brian Kernighan of Bell Laboratories provided valuable assistance dur-
ing testing and debugging. We thank him.
COPYING PERMISSIONS
Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
2001, 2002, 2003, 2004, 2005, 2007, 2009 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual page provided the copyright notice and this permission notice
are preserved on all copies.
Permission is granted to copy and distribute modified versions of this
manual page under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this man-
ual page into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a trans-
lation approved by the Foundation.
Free Software Foundation Jul 10 2009 GAWK(1)
리눅스 기본명령어 - awk #
awk는 패턴 탐색과 처리를 위한 명령어로 간단하게 파일에서 결과를 추려내고 가공하여 원하는 결과물을 만들어내는 유틸리티이다. 즉 파일에서 패턴이 일치하는 행을 찾아서 지정한 조치를 수행해주는 명령어이다. awk 명령은 사용자가 정의한 명령어 집합을 이용하여 파일 집합과 사용자가 제공한 확장 표현식을 한번에 한 행씩 비교한 다음 확장 정규식과 일치하는 모든 행에 작용하여 특별한 작업을 해 준다. awk 명령의 패턴 탐색은 grep 명령의 패턴 탐색보다 더 일반적이고, 입력 텍스트 행에 대해 여러 가지 조치를 실행할 수 있으며 awk 명령 프로그래밍 언어는 컴파일을 필요로 하지 않고, 변수, 수치 함수, 문자열 함수 및 논리 연산자를 사용할 수 도 있다. 보통은 sed명령과 조합하여 많이 사용하기도 한다.
초기 개발자 Aho, Weinberger, Kernighan의 첫글자를 따서 이름지어진 awk는 GNU 프로젝트에서 만들어진 텍스트 처리 프로그래밍 언어로 유닉스 계열의 OS에서 사용 가능하며, 텍스트 형태로 되어있는 입력 데이터를 행과 단어 별로 처리해 출력한다.
기본적으로 패턴과 패턴을 처리하는 명령어 짝을 늘여놓은 구조로 이루어져 있는데, 입력으로부터 한 줄씩을 읽어서 정규 표현식으로 조건이 맞는지를 검사하고 참으로 판명되면 그 줄에 대해 명령어를 실행하는 형식이다.
기본적인 사용형식은 다음과 같다.
$ awk 'pattern' filename 조건
$ awk '{action}' filename 몇 번째 필드
$ awk 'pattern {action}' filename 무슨 조건에 몇 번째 필드
$ awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ... ]
1. (필드) 개행으로 구분한다
2. 자료 처리 및 리포트 생성에 사용하는 프로그래밍 언어
3. 입력 데이터로는 표준 입력, 여러 개의 파일 또는 다른 프로세스의 결과를 사용 할 수 있다
4. 사용자가 지정한 패턴 검색이나 특별한 작업수행 위해 파일을 행 단위로 조사한다
5. $0은 모든 필드
사용법은 다음과 같다.
$ awk -F delim '{ print $x }'
기본적인 사용법은 위의 형태로 많이 사용한다. awk 명령어와 구분자 그리고 출력할 내용을 써 주면 된다.
$ awk '{print $1}' file
파일의 첫 번째 필드를 출력한다, 첫 번째 필드는각 행의 맨 왼쪽 경계에서 시작 공백문자로 구분하여 출력하는데 이 경우에는 첫번째가 된다.
특정 폴더 내에 있는 파일 목록만 가져오는 경우를 생각해보면 ls 명령어를 사용해서 > 로 리다이렉션 시켜서 vi로 작업하는 방법을 사용할 수 도 있고 그냥 복사해서 한 라인에 하나씩 오도록 엔터를 쳐서 사용해도 된다. 그러나 이 경우 awk명령어를 사용하면 보다 편하게 사용할 수 있다. 먼저 ls -al 를 해 보면 다음과 같이 상세 정보를 출력한다.
$ ls -al
-rw-r--r-- 1 kylee wheel 10235636 11 18 13:44 201611101584be7224a770
-rw-r--r-- 1 kylee wheel 1897350 11 18 13:44 201611101584be7224a770.gz
-rw-r--r-- 1 kylee wheel 70597 11 18 13:44 201611101584be7224a770.gz.csi
-rw-r--r-- 1 kylee wheel 10368400 11 18 13:44 201611101584be8191b560
-rw-r--r-- 1 kylee wheel 1927540 11 18 13:44 201611101584be8191b560.gz
-rw-r--r-- 1 kylee wheel 63101 11 18 13:44 201611101584be8191b560.gz.csi
-rw-r--r-- 1 kylee wheel 9907325 11 18 13:43 201611101584be8e59c610
-rw-r--r-- 1 kylee wheel 1849363 11 18 13:44 201611101584be8e59c610.gz
-rw-r--r-- 1 kylee wheel 77322 11 18 13:44 201611101584be8e59c610.gz.csi
-rw-r--r-- 1 kylee wheel 8893514 11 18 13:44 201611101584be99aab100
-rw-r--r-- 1 kylee wheel 1673036 11 18 13:44 201611101584be99aab100.gz
-rw-r--r-- 1 kylee wheel 66183 11 18 13:44 201611101584be99aab100.gz.csi
이 상황에서 맨 마지막에 있는 파일을 복사해서 편집해도 되지만 불편하다. 간단히 다음과 같이 사용하면 된다.
$ ls -al | awk '{ print $9}'
이렇게 명령을 내려주면 awk 프로그램이 공백으로 분리해서 마지막 즉 9번째가 파일명이 되는데 요 번호를 $9와 같이 써 주면 깔끔하게 다음과 같이 출력이 된다.
201611101584be7224a770
201611101584be7224a770.gz
201611101584be7224a770.gz.csi
201611101584be8191b560
201611101584be8191b560.gz
201611101584be8191b560.gz.csi
201611101584be8e59c610
201611101584be8e59c610.gz
201611101584be8e59c610.gz.csi
201611101584be99aab100
201611101584be99aab100.gz
201611101584be99aab100.gz.csi
다른 방법을 쓸 필요도 없이 간단하게 내가 원하는 결과를 얻을 수 있다. >를 이용해서 파일에 저장해서 따로 사용할 수 있다.
$ history | awk '{ print $2 }' \ | sort | uniq -c | sort -nr | head -3
이 예는 리눅스를 사용하면서 가장 많이 내린 명령어를 찾아서 출력해주는 간단한(?) 명령어 조합이다. 특히 생물학 데이터를 다룰 때는 특정 형식으로 되어 있는 경우가 많다. 다음과 같이 bed 파일이 있는데 이 파일을 gbrowse 사용하기 위해서는 직접 사용할 수 없고 GFF3 형식으로 변환해주면 브라우저에서 그림으로 확인할 수 있는 상태가 된다. 예제로 애기장대의 bed파일 내용인데 이 내용을 아래 명령어로 바꿔주면 별도의 프로그래밍 없이도 bed --> gff로 변경할 수 있다.
chr1 2025600 2027271 AT1G06620.10 + 2025617 2027094 0 3541,322,429,
$ awk '{print $1”\tawk\tmRNA\t”$2”\t”$3”\t” \ $5”\t”$6”\t0\t”$4 }’ example.bed
그러면 내가 원하는 형식으로 변경이 되어 사용할 수 있다.
옵션 #
- -F 필드구분자 : 필드구분자를 지정할 때 사용하며, 기본 필드구분자는 공백이다.
- -f 파일명 : 스크립트파일을 불러온다.
팁 #
몇 가지 사용법을 추가로 확인하면 다음과 같다.
$ awk '/Arabidopsis/{printf $1, $2}' file
파일에서 Arabidopsis를 포함하는 행들의 첫 번째와 두 번째 필드를 출력한다. 이 경우에는 특정 단어로 시작하는 행을 찾아서 공백으로 구분한 뒤 필요한 컬럼을 출력한다.
$ df | awk '$2 > 10000 '
df ( 현재 디바이스 정보 ) 명령어를 통해 출력되는 내용 중 두 번째 필드가 10000보다 큰행을 출력한다.
$ date | awk '{print "Month : $2\n Year"$6}'
date 명령어를 통해 출력되는 시간 정보 중 두 번째 필드와 여섯번 째 필드를 서로 개행 하여 출력하면 월과 년을 출력할 수 있다.
$ awk '{print NR, $1, $3}' file
NR(하나의 레코드를 처리한 뒤 1 이 증가하는 변수)을 사용하여 레코드 번호와 함께 파일의 내용대로 출력한다. 그러면 각 라인별로 번호를 매겨 정보를 줄 수 있다.
$ awk -F : '/root/{print $0}' /etc/passwd
이 경우에는 보통 \t이나 공백으로 라인을 나누는데 특이한 구분자로 구분 되어 있는 경우 별도로 구분자를 지정해 줄 수 있는데 -F 다음에 구분자를 써 주면 된다. 이 경우 " : " 구분자를 기준으로 필드를 나누며 root를 포함하는 행을 출력한다. /expression/이 되는데 // 안에는 정규표현식을 사용하여 원하는 패턴을 찾을 수도 있다.
$ awk -F : '\^[ab]\{print $1}' file
a나 b로 시작하는 행의 첫 번째 필드를 출력한다.
$ awk -F '[\t]' '{print $1, $2, $3}' file
Tab 으로 필드를 구분하며 첫 번째, 두 번째, 세 번째, 필드를 출력한다.보통 엑셀에서 작업한 파일을 텍스트 파일로 변환하면 TSV, CSV형태로 저장한다. 이때 구분자는 \t(탭)이나 ,(콤마)로 구분이 되는데 이 경우 구분자를 지정해주면 편하게 작업할 수 있다.
$ awk '$1 ~ /[aA]TG/' seq.fasta
~는 특정 레코드나 필드 내에서 일치하는 정규 표현식 패턴이 존재하는지 검사 위해 쓰인다
$ awk '$1 !~ /*$' protein.fasta
첫 번째 필드가 * 로 끝나지 않는 행들을 출력한다. 즉 종료 서열이 없는 서열을 찾을 때 사용한다.
SED(1) User Commands SED(1)
NAME
sed - stream editor for filtering and transforming text
SYNOPSIS
sed [OPTION]... {script-only-if-no-other-script} [input-file]...
DESCRIPTION
Sed is a stream editor. A stream editor is used to perform basic text
transformations on an input stream (a file or input from a pipeline).
While in some ways similar to an editor which permits scripted edits
(such as ed), sed works by making only one pass over the input(s), and
is consequently more efficient. But it is sed’s ability to filter text
in a pipeline which particularly distinguishes it from other types of
editors.
-n, --quiet, --silent
suppress automatic printing of pattern space
-e script, --expression=script
add the script to the commands to be executed
-f script-file, --file=script-file
add the contents of script-file to the commands to be executed
--follow-symlinks
follow symlinks when processing in place; hard links will still
be broken.
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied). The
default operation mode is to break symbolic and hard links.
This can be changed with --follow-symlinks and --copy.
-c, --copy
use copy instead of rename when shuffling files in -i mode.
While this will avoid breaking links (symbolic or hard), the
resulting editing operation is not atomic. This is rarely the
desired mode; --follow-symlinks is usually enough, and it is
both faster and more secure.
-l N, --line-length=N
specify the desired line-wrap length for the ‘l’ command
--posix
disable all GNU extensions.
-r, --regexp-extended
use extended regular expressions in the script.
-s, --separate
consider files as separate rather than as a single continuous
long stream.
-u, --unbuffered
load minimal amounts of data from the input files and flush the
output buffers more often
--help display this help and exit
--version
output version information and exit
If no -e, --expression, -f, or --file option is given, then the first
non-option argument is taken as the sed script to interpret. All
remaining arguments are names of input files; if no input files are
specified, then the standard input is read.
GNU sed home page: <http://www.gnu.org/software/sed/>. General help
using GNU software: <http://www.gnu.org/gethelp/>. E-mail bug reports
to: <bug-gnu-utils@gnu.org>. Be sure to include the word ‘‘sed’’ some-
where in the ‘‘Subject:’’ field.
COMMAND SYNOPSIS
This is just a brief synopsis of sed commands to serve as a reminder to
those who already know sed; other documentation (such as the texinfo
document) must be consulted for fuller descriptions.
Zero-address ‘‘commands’’
: label
Label for b and t commands.
#comment
The comment extends until the next newline (or the end of a -e
script fragment).
} The closing bracket of a { } block.
Zero- or One- address commands
= Print the current line number.
a \
text Append text, which has each embedded newline preceded by a back-
slash.
i \
text Insert text, which has each embedded newline preceded by a back-
slash.
q [exit-code]
Immediately quit the sed script without processing any more
input, except that if auto-print is not disabled the current
pattern space will be printed. The exit code argument is a GNU
extension.
Q [exit-code]
Immediately quit the sed script without processing any more
input. This is a GNU extension.
r filename
Append text read from filename.
R filename
Append a line read from filename. Each invocation of the com-
mand reads a line from the file. This is a GNU extension.
Commands which accept address ranges
{ Begin a block of commands (end with a }).
b label
Branch to label; if label is omitted, branch to end of script.
t label
If a s/// has done a successful substitution since the last
input line was read and since the last t or T command, then
branch to label; if label is omitted, branch to end of script.
T label
If no s/// has done a successful substitution since the last
input line was read and since the last t or T command, then
branch to label; if label is omitted, branch to end of script.
This is a GNU extension.
c \
text Replace the selected lines with text, which has each embedded
newline preceded by a backslash.
d Delete pattern space. Start next cycle.
D Delete up to the first embedded newline in the pattern space.
Start next cycle, but skip reading from the input if there is
still data in the pattern space.
h H Copy/append pattern space to hold space.
g G Copy/append hold space to pattern space.
x Exchange the contents of the hold and pattern spaces.
l List out the current line in a ‘‘visually unambiguous’’ form.
l width
List out the current line in a ‘‘visually unambiguous’’ form,
breaking it at width characters. This is a GNU extension.
n N Read/append the next line of input into the pattern space.
p Print the current pattern space.
P Print up to the first embedded newline of the current pattern
space.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success-
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
w filename
Write the current pattern space to filename.
W filename
Write the first line of the current pattern space to filename.
This is a GNU extension.
y/source/dest/
Transliterate the characters in the pattern space which appear
in source to the corresponding character in dest.
Addresses
Sed commands can be given with no addresses, in which case the command
will be executed for all input lines; with one address, in which case
the command will only be executed for input lines which match that
address; or with two addresses, in which case the command will be exe-
cuted for all input lines which match the inclusive range of lines
starting from the first address and continuing to the second address.
Three things to note about address ranges: the syntax is addr1,addr2
(i.e., the addresses are separated by a comma); the line which addr1
matched will always be accepted, even if addr2 selects an earlier line;
and if addr2 is a regexp, it will not be tested against the line that
addr1 matched.
After the address (or address-range), and before the command, a ! may
be inserted, which specifies that the command shall only be executed if
the address (or address-range) does not match.
The following address types are supported:
number Match only the specified line number.
first~step
Match every step’th line starting with line first. For example,
‘‘sed -n 1~2p’’ will print all the odd-numbered lines in the
input stream, and the address 2~5 will match every fifth line,
starting with the second. first can be zero; in this case, sed
operates as if it were equal to step. (This is an extension.)
$ Match the last line.
/regexp/
Match lines matching the regular expression regexp.
\cregexpc
Match lines matching the regular expression regexp. The c may
be any character.
GNU sed also supports some special 2-address forms:
0,addr2
Start out in "matched first address" state, until addr2 is
found. This is similar to 1,addr2, except that if addr2 matches
the very first line of input the 0,addr2 form will be at the end
of its range, whereas the 1,addr2 form will still be at the
beginning of its range. This works only when addr2 is a regular
expression.
addr1,+N
Will match addr1 and the N lines following addr1.
addr1,~N
Will match addr1 and the lines following addr1 until the next
line whose input line number is a multiple of N.
REGULAR EXPRESSIONS
POSIX.2 BREs should be supported, but they aren’t completely because of
performance problems. The \n sequence in a regular expression matches
the newline character, and similarly for \a, \t, and other sequences.
BUGS
E-mail bug reports to bonzini@gnu.org. Be sure to include the word
‘‘sed’’ somewhere in the ‘‘Subject:’’ field. Also, please include the
output of ‘‘sed --version’’ in the body of your report if at all possi-
ble.
COPYRIGHT
Copyright © 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE, to the extent permitted by law.
GNU sed home page: <http://www.gnu.org/software/sed/>. General help
using GNU software: <http://www.gnu.org/gethelp/>. E-mail bug reports
to: <bug-gnu-utils@gnu.org>. Be sure to include the word ‘‘sed’’ some-
where in the ‘‘Subject:’’ field.
SEE ALSO
awk(1), ed(1), grep(1), tr(1), perlre(1), sed.info, any of various
books on sed, the sed FAQ (http://sed.sf.net/grabbag/tutorials/sed-
faq.txt), http://sed.sf.net/grabbag/.
The full documentation for sed is maintained as a Texinfo manual. If
the info and sed programs are properly installed at your site, the com-
mand
info sed
should give you access to the complete manual.
sed version 4.2.1 March 2012 SED(1)
sed ‘s/찾는문자열/바꿀문자열/g’ 입력파일
리눅스 기본명령어 - sed #
# sed "s/hello/goodbye" in.file
위의 명령어는 in.file이라는 화일에 있는 각 라인에서 첫번째 등장하는 hello라는 문자열을 goodbye로
교체한 후 그 라인을 표준 출력에 출력한다.
# echo "1234hello5678" | sed "s/hello/goodbye/"
대치 명령어를 따옴표로 둘러싸야 올바로 사용할 수 있다. 여기서 문자열은 정규식으로 표현될수 도 있다.
그외에도 sed명령어에는 여러 가지 연산자를 사용할 수 있다. 다음의 명령어를 사용하면 hello라는
문자열을 포함하고 있는 모든 문자열을 삭제할 수 있다.
# sed "/hello/d" in.file
위 명령어의 의미는 "hello라는 문자열을 포함하고 있는 라인을 찾아 그 라인을 삭제하라"는 것이다.
이 sed 명령어는 다음 명령어와 같은 의미이다.
# grep -v hello in.file
라인을 전부 삭제하지 않고 hello라는 문자열만을 삭제하려면 다음 명령어를 사용하다.
# sed "s/hello//" in.file
ed와 같이 sed에서도 화일의 일부만을 대상으로 작업하는 경우는 라인의 범위를 지정할 수 있다.
# sed "3,7s/hello//" in.file
위의 명령어는 in.file이라는 화일의 라인3에서 7까지만을 대상으로 첫번째 hello를 삭제하고 화일의
그 외의 부분은 변경시키지 않는다. 또한 다음과 같이 사용하면 라인 번호 대신 문맥을 범위로
지정할 수 있다.
# sed "/hello/,/goodbye/s/bad/good/g" in.file
위의 명령어는 hello라는 단어를 포함하고 있는 첫번째 라인부터 goodbye라는 단어를 포함하고 있는
라인까지 검색하면서 bad라는 문자열을 모두 good으로 변경한다.
또한 문자열 goodbye를 만난 이후에도 다시 다른 hello가 등장하면 다음 goodbye가 나올 때까지 대치
작업은 반복된다.
sed명령어의 기능은 지금까지 우리가 살펴본 것보다 더 강력하다.
sed명령어의 -f(file)선택자를 사용하면 명령어를 일일이 키보드에서 입력하지 않고 하나의 화일에
기억시켜 놓고 사용할 수도 있다.
# sed -f command.file in.file
여러 개의 명령어를 연속적으로 자주 사용할 때 이 명령어 화일이 유용하게 사용된다.
예를 들어 다음과 같은복수 개의 명령어가 화일에 기억되어 있는 경우는
# vi command.file
s/hello/goodbye
s/good/bad
다음과 같은 명령어를 입력하면
# echo "1234hello5678" | sed -f command.file
다음과 같이 출력된다.
# echo "1234hello5678" | sed -f command.file
1234badbye5678
o sed 기본
# sed '' ljs --> cat ljs 와 동일
o sed 편집 명령어
일상적인 sed 명령
---------------------------------------------------------------------------------------
a\ 다음 라인(들)을 적용될 라인들에 부가한다 (라인뒤)
c\ 적용될 라인들을 다음 라인(들)로 변경한다 (라인 대체)
d 적용될 라인들을 삭제한다
g 단지 첫번째의 것만이 아니라 라인의 모든 부합 패턴 대체가 적용 되게 한다
i\ 다음 라인(들)을 적용될 라인들 위에 삽입한다 (라인앞)
p - n 옵션하에 있을지라도, 라인을 프린트한다
q 명시된 라인에 도달할 때 중지한다
r filename filename을 판독한다. 내용을 출력에 부가한다
s/old/new/ "old"를 "new"로 대체한다
= 라인 번호를 프린트한다
!command 라인이 선택되지 않는 경우 command를 적용한다.
--------------------------------------------------------------------------------------
o 라인 명시
sed명령은 두가지 방법을 사용한다. 첫 번째는 번지를 번호로 명시하는것이다.
여러분은 특정한 라인을 가리키기 위해 단일 번호를 사용할 수 있다.
# sed '3d' ljs --> 세번째 라인을 삭제
또는, 라인들의 범위를 가리키기 위해 콤마(,)로 분리된 두 번호들을 사용할 수 있다.
# sed '2,4 s/e/#/' ljs --> 대체 명령은 단지 2-4 라인들에만 적용된다. (단순 대체 명령은 라인에서
첫번째 어커런스에만 적용된다는 점을 기억하라. 따라서 각 적용 라인의
첫 번째 e만이 #로 대체된다)
# sed -n '/kingdom/p' ljs --> kingdom이 들어있는 line만 프린트
# sed '/kingdom/p' ljs --> 모든 line이 나타나고 그와 동시에 kingdom line이 중복해서 나타남
# sed '[Pp]rincess/d' ljs --> princess 또는 Princess를 포함하고 있는 라인들을 삭제함
# sed '1,/fragrant/d' ljs --> 라인 1로부터 fragrant를 포함하고 있는 첫번째 라인까지의 모든
라인들을 삭제함
o sed 명령 하이라이트
# more ljs
I am a boy
You are a girk
He is a doctor
# sed 'a\\
Hey la la\! Doo de dah\!' ljs --> 각 라인뒤에다 Hey la la!를 입력
I am a boy
Hey !
You are a girk
Hey !
He is a doctor
Hey !
# sed 'a\\
Oh\! good\\ --> \\을 사용함으로써 하나 이상의 라인들을 부가할 수 있다
yeh' ljs
# sed '3a\\
Good Morning' ljs --> 3 line뒤에다 내용 삽입
# sed 'c\\
Oh marvelous delight! sing to me! ' ljs --> 기존의 라인들을 이것으로 대체시킴
Oh marvelous delight! sing to me!
Oh marvelous delight! sing to me!
Oh marvelous delight! sing to me!
# sed '2q' ljs = sed 2q ljs --> q명령은 편집기로 하여금 그것이 명시된 라인에 도착한 뒤
중지하게 한다. 즉 2라인만 보여줌
# sed -n '1s/a/#/gp' ljs --> 전체적으로 바꿔줌
o sed의 패턴-부합
패턴-부합에 대한 sed메타 문자
-------------------------------------------------------------------
메타 문자 작 용
-------------------------------------------------------------------
\ 다음 문자의 특수한 의미를 부정한다
^ 라인의 시작과 부합한다
$ 라인의 끝과 부합한다
. 어떠한 단일 문자와도 부합한다
[ ] 둘러싸인 문자들 중의 어느 하나와 부합한다
[^...] ...리스트에 없는 어떠한 문자와도 부합한다
pat* 0 또는 그 이상의 pat 어커런스들과 부합한다
여기에서 pat는 단일문자 또는 [ ]패턴이다
& s 명령의 newpattern부분에서 사용되어 oldpattern
부분의 재 산출을 나타낸다
-------------------------------------------------------------------
o 간략한 예
----------------------------------------------------------------------------------------
명 령 결 과
----------------------------------------------------------------------------------------
/Second/ Second를 포함하고 있는 어떠한 라인과도 부합한다.
/^Second/ Second로 시작하는 어떠한 라인과도 부합한다.
/^$/ 공백라인, 즉 라인의 시작과 끝 사이에 아무것도 없는 라인과 부합한다.
이것은 공백 스페이스들로 된 라인과는 부합하지 않는바, 스페이스 자체가
문자이기 때문이다.
/c.t/ cat, cot, 기타 등을 포함하고 있는 라인들과 부합한다. 이 패턴은 단어의
일부일 수 있음에 유의하라. 예를 들어, apricot와 acute도 부합된다.
/./ 적어도 한 문자를 포함하고 있는 라인들과 부합한다.
/\./ 피리어드를 포함하고 있는 라인들과 부합한다. \는 .의 특수한 의미를 부정
/s[oa]p/ sop또는 sap와는 부합하지만 sip 또는 sup와는 부합하지 않는다.
/s[ ^oa]p/ sip또는 sup와는 부합하지만 sop또는 sap와는 부합하지 않는다.
s/cow/s&s/ cow를 scows로 대체한다.
/co*t/ * --> 어떠한 수
----------------------------------------------------------------------------------------
o 간단한 sed 해법
# sed '/^$/d' ljs --> 모든 공백 라인 제거
# sed '/^ *$/d' --> space로 만들어진 공백까지 제거 (조심! ^와 *사이에 공백이 있어야 한다)
# sed 'a\\
' ljs --> 각 line마다 공백라인 추가
# sed '/^#/d' ljs --> 첫번째 열에 #을 가진 라인 제거
# sed 's/^/ /' ljs --> 각 line의 시작을 5 space로 대체
o 다중 명령
# sed 's/Bob/Robert/g\
s/Pat/Patricia/g' ljs --> sh을 사용하는 경우에는 \을 생략하라
# sed 's/cat/dog/g\
s/dog/pigs/g' ljs --> 먼저 모든 cats를 dogs로 변환한 다음에 모든 dogs를 pigs로 변환한다.
# sed 's/Bob/Robert/g\
s/Pat[^a-z]/Patricia/g' ljs --> ^a-z은 a에서 z까지의 문자들이 아닌 모든 문자를
의미한다는 점을 상기하라
o 태그
위에서 Pat!와 같은 것이 발견될때 !를 포함한 전체 문자열이 Patricia로 대체되므로 !가 소실된다.
우리는 !를 유지하면서 Pat를 대체하는 방법을 필요로 한다. 우리는 이것을 태그(tag)를 사용하여
수행할 수 있다. 패턴의 일부를 "태그"하려면, 그것을 좌측에는 \(로 우측에는 \)로 둘러싸라.
그 다음에, 명령의 newpattern부분에서, 여러분은 그렇게 둘러싸인 패턴의 첫 번째 것은 \1로,
두번째 것은 \2 등으로 인용할 수 있다. 이 방법을 사용하면 다음의 명령이 부여된다.
# sed 's/\(Pat\)\([^a-z]\)/\1ricia\2/g' ljs
o 쉘 스크립트와 sed
# vi twospace
sed 'a\\
' $* --> $*은 모든 인자들을 나타냄
# twospace ljs | pr | lpr
위 예는 sed가 어떻게 하여 UNIX 프로그래밍과 쉘 스크립트에 적합한가를 나타낸다.
VIM(1) VIM(1)
NAME
vim - Vi IMproved, a programmers text editor
SYNOPSIS
vim [options] [file ..]
vim [options] -
vim [options] -t tag
vim [options] -q [errorfile]
ex gex
view
gvim gview vimx evim eview
rvim rview rgvim rgview
DESCRIPTION
Vim is a text editor that is upwards compatible to Vi. It can be used
to edit all kinds of plain text. It is especially useful for editing
programs.
There are a lot of enhancements above Vi: multi level undo, multi win-
dows and buffers, syntax highlighting, command line editing, filename
completion, on-line help, visual selection, etc.. See ":help
vi_diff.txt" for a summary of the differences between Vim and Vi.
While running Vim a lot of help can be obtained from the on-line help
system, with the ":help" command. See the ON-LINE HELP section below.
Most often Vim is started to edit a single file with the command
vim file
More generally Vim is started with:
vim [options] [filelist]
If the filelist is missing, the editor will start with an empty buffer.
Otherwise exactly one out of the following four may be used to choose
one or more files to be edited.
file .. A list of filenames. The first one will be the current
file and read into the buffer. The cursor will be posi-
tioned on the first line of the buffer. You can get to the
other files with the ":next" command. To edit a file that
starts with a dash, precede the filelist with "--".
- The file to edit is read from stdin. Commands are read
from stderr, which should be a TTY.
-t {tag} The file to edit and the initial cursor position depends on
a "tag", a sort of goto label. {tag} is looked up in the
tags file, the associated file becomes the current file and
the associated command is executed. Mostly this is used
for C programs, in which case {tag} could be a function
name. The effect is that the file containing that function
becomes the current file and the cursor is positioned on
the start of the function. See ":help tag-commands".
-q [errorfile]
Start in quickFix mode. The file [errorfile] is read and
the first error is displayed. If [errorfile] is omitted,
the filename is obtained from the ’errorfile’ option
(defaults to "AztecC.Err" for the Amiga, "errors.err" on
other systems). Further errors can be jumped to with the
":cn" command. See ":help quickfix".
Vim behaves differently, depending on the name of the command (the exe-
cutable may still be the same file).
vim The "normal" way, everything is default.
ex Start in Ex mode. Go to Normal mode with the ":vi" command.
Can also be done with the "-e" argument.
view Start in read-only mode. You will be protected from writing
the files. Can also be done with the "-R" argument.
gvim gview
The GUI version. Starts a new window.
gex Starts a new gvim window in Ex mode. Can also be done with
the "-e" argument to gvim
vimx Starts gvim in "Vi" mode similar to "vim", but with addi-
tional features like xterm clipboard support
evim eview
The GUI version in easy mode. Starts a new window. Can also
be done with the "-y" argument.
rvim rview rgvim rgview
Like the above, but with restrictions. It will not be possi-
ble to start shell commands, or suspend Vim. Can also be
done with the "-Z" argument.
OPTIONS
The options may be given in any order, before or after filenames.
Options without an argument can be combined after a single dash.
+[num] For the first file the cursor will be positioned on line
"num". If "num" is missing, the cursor will be positioned
on the last line.
+/{pat} For the first file the cursor will be positioned on the
first occurrence of {pat}. See ":help search-pattern" for
the available search patterns.
+{command}
-c {command}
{command} will be executed after the first file has been
read. {command} is interpreted as an Ex command. If the
{command} contains spaces it must be enclosed in double
quotes (this depends on the shell that is used). Example:
Vim "+set si" main.c
Note: You can use up to 10 "+" or "-c" commands.
-S {file} {file} will be sourced after the first file has been read.
This is equivalent to -c "source {file}". {file} cannot
start with ’-’. If {file} is omitted "Session.vim" is used
(only works when -S is the last argument).
--cmd {command}
Like using "-c", but the command is executed just before
processing any vimrc file. You can use up to 10 of these
commands, independently from "-c" commands.
-A If Vim has been compiled with ARABIC support for editing
right-to-left oriented files and Arabic keyboard mapping,
this option starts Vim in Arabic mode, i.e. ’arabic’ is
set. Otherwise an error message is given and Vim aborts.
-b Binary mode. A few options will be set that makes it pos-
sible to edit a binary or executable file.
-C Compatible. Set the ’compatible’ option. This will make
Vim behave mostly like Vi, even though a .vimrc file
exists.
-d Start in diff mode. There should be two, three or four
file name arguments. Vim will open all the files and show
differences between them. Works like vimdiff(1).
-d {device} Open {device} for use as a terminal. Only on the Amiga.
Example: "-d con:20/30/600/150".
-D Debugging. Go to debugging mode when executing the first
command from a script.
-e Start Vim in Ex mode, just like the executable was called
"ex".
-E Start Vim in improved Ex mode, just like the executable was
called "exim".
-f Foreground. For the GUI version, Vim will not fork and
detach from the shell it was started in. On the Amiga, Vim
is not restarted to open a new window. This option should
be used when Vim is executed by a program that will wait
for the edit session to finish (e.g. mail). On the Amiga
the ":sh" and ":!" commands will not work.
--nofork Foreground. For the GUI version, Vim will not fork and
detach from the shell it was started in.
-F If Vim has been compiled with FKMAP support for editing
right-to-left oriented files and Farsi keyboard mapping,
this option starts Vim in Farsi mode, i.e. ’fkmap’ and
’rightleft’ are set. Otherwise an error message is given
and Vim aborts.
-g If Vim has been compiled with GUI support, this option
enables the GUI. If no GUI support was compiled in, an
error message is given and Vim aborts.
-h Give a bit of help about the command line arguments and
options. After this Vim exits.
-H If Vim has been compiled with RIGHTLEFT support for editing
right-to-left oriented files and Hebrew keyboard mapping,
this option starts Vim in Hebrew mode, i.e. ’hkmap’ and
’rightleft’ are set. Otherwise an error message is given
and Vim aborts.
-i {viminfo}
When using the viminfo file is enabled, this option sets
the filename to use, instead of the default "~/.viminfo".
This can also be used to skip the use of the .viminfo file,
by giving the name "NONE".
-L Same as -r.
-l Lisp mode. Sets the ’lisp’ and ’showmatch’ options on.
-m Modifying files is disabled. Resets the ’write’ option.
You can still modify the buffer, but writing a file is not
possible.
-M Modifications not allowed. The ’modifiable’ and ’write’
options will be unset, so that changes are not allowed and
files can not be written. Note that these options can be
set to enable making modifications.
-N No-compatible mode. Reset the ’compatible’ option. This
will make Vim behave a bit better, but less Vi compatible,
even though a .vimrc file does not exist.
-n No swap file will be used. Recovery after a crash will be
impossible. Handy if you want to edit a file on a very
slow medium (e.g. floppy). Can also be done with ":set
uc=0". Can be undone with ":set uc=200".
-nb Become an editor server for NetBeans. See the docs for
details.
-o[N] Open N windows stacked. When N is omitted, open one window
for each file.
-O[N] Open N windows side by side. When N is omitted, open one
window for each file.
-p[N] Open N tab pages. When N is omitted, open one tab page for
each file.
-R Read-only mode. The ’readonly’ option will be set. You
can still edit the buffer, but will be prevented from acci-
dentally overwriting a file. If you do want to overwrite a
file, add an exclamation mark to the Ex command, as in
":w!". The -R option also implies the -n option (see
below). The ’readonly’ option can be reset with ":set
noro". See ":help ’readonly’".
-r List swap files, with information about using them for
recovery.
-r {file} Recovery mode. The swap file is used to recover a crashed
editing session. The swap file is a file with the same
filename as the text file with ".swp" appended. See ":help
recovery".
-s Silent mode. Only when started as "Ex" or when the "-e"
option was given before the "-s" option.
-s {scriptin}
The script file {scriptin} is read. The characters in the
file are interpreted as if you had typed them. The same
can be done with the command ":source! {scriptin}". If the
end of the file is reached before the editor exits, further
characters are read from the keyboard.
-T {terminal}
Tells Vim the name of the terminal you are using. Only
required when the automatic way doesn’t work. Should be a
terminal known to Vim (builtin) or defined in the termcap
or terminfo file.
-u {vimrc} Use the commands in the file {vimrc} for initializations.
All the other initializations are skipped. Use this to
edit a special kind of files. It can also be used to skip
all initializations by giving the name "NONE". See ":help
initialization" within vim for more details.
-U {gvimrc} Use the commands in the file {gvimrc} for GUI initializa-
tions. All the other GUI initializations are skipped. It
can also be used to skip all GUI initializations by giving
the name "NONE". See ":help gui-init" within vim for more
details.
-V[N] Verbose. Give messages about which files are sourced and
for reading and writing a viminfo file. The optional num-
ber N is the value for ’verbose’. Default is 10.
-v Start Vim in Vi mode, just like the executable was called
"vi". This only has effect when the executable is called
"ex".
-w {scriptout}
All the characters that you type are recorded in the file
{scriptout}, until you exit Vim. This is useful if you
want to create a script file to be used with "vim -s" or
":source!". If the {scriptout} file exists, characters are
appended.
-W {scriptout}
Like -w, but an existing file is overwritten.
-x Use encryption when writing files. Will prompt for a crypt
key.
-X Don’t connect to the X server. Shortens startup time in a
terminal, but the window title and clipboard will not be
used.
-y Start Vim in easy mode, just like the executable was called
"evim" or "eview". Makes Vim behave like a click-and-type
editor.
-Z Restricted mode. Works like the executable starts with
"r".
-- Denotes the end of the options. Arguments after this will
be handled as a file name. This can be used to edit a
filename that starts with a ’-’.
--echo-wid GTK GUI only: Echo the Window ID on stdout.
--help Give a help message and exit, just like "-h".
--literal Take file name arguments literally, do not expand wild-
cards. This has no effect on Unix where the shell expands
wildcards.
--noplugin Skip loading plugins. Implied by -u NONE.
--remote Connect to a Vim server and make it edit the files given in
the rest of the arguments. If no server is found a warning
is given and the files are edited in the current Vim.
--remote-expr {expr}
Connect to a Vim server, evaluate {expr} in it and print
the result on stdout.
--remote-send {keys}
Connect to a Vim server and send {keys} to it.
--remote-silent
As --remote, but without the warning when no server is
found.
--remote-wait
As --remote, but Vim does not exit until the files have
been edited.
--remote-wait-silent
As --remote-wait, but without the warning when no server is
found.
--remote-tab[-wait][-silent]
As --remote but use tab page per file
--role Set a unique role to identify the main window
--serverlist
List the names of all Vim servers that can be found.
--servername {name}
Use {name} as the server name. Used for the current Vim,
unless used with a --remote argument, then it’s the name of
the server to connect to.
--socketid {id}
GTK GUI only: Use the GtkPlug mechanism to run gvim in
another window.
--version Print version information and exit.
ON-LINE HELP
Type ":help" in Vim to get started. Type ":help subject" to get help
on a specific subject. For example: ":help ZZ" to get help for the
"ZZ" command. Use <Tab> and CTRL-D to complete subjects (":help cmd-
line-completion"). Tags are present to jump from one place to another
(sort of hypertext links, see ":help"). All documentation files can be
viewed in this way, for example ":help syntax.txt".
FILES
/usr/share/vim/vim74/doc/*.txt
The Vim documentation files. Use ":help doc-file-list"
to get the complete list.
/usr/share/vim/vim74/doc/tags
The tags file used for finding information in the docu-
mentation files.
/usr/share/vim/vim74/syntax/syntax.vim
System wide syntax initializations.
/usr/share/vim/vim74/syntax/*.vim
Syntax files for various languages.
/etc/vimrc System wide Vim initializations.
~/.vimrc Your personal Vim initializations.
/etc/gvimrc System wide gvim initializations.
~/.gvimrc Your personal gvim initializations.
/usr/share/vim/vim74/optwin.vim
Script used for the ":options" command, a nice way to
view and set options.
/usr/share/vim/vim74/menu.vim
System wide menu initializations for gvim.
/usr/share/vim/vim74/bugreport.vim
Script to generate a bug report. See ":help bugs".
/usr/share/vim/vim74/filetype.vim
Script to detect the type of a file by its name. See
":help ’filetype’".
/usr/share/vim/vim74/scripts.vim
Script to detect the type of a file by its contents.
See ":help ’filetype’".
/usr/share/vim/vim74/print/*.ps
Files used for PostScript printing.
For recent info read the VIM home page:
<URL:http://www.vim.org/>
SEE ALSO
vimtutor(1)
AUTHOR
Most of Vim was made by Bram Moolenaar, with a lot of help from others.
See ":help credits" in Vim.
Vim is based on Stevie, worked on by: Tim Thompson, Tony Andrews and
G.R. (Fred) Walter. Although hardly any of the original code remains.
BUGS
Probably. See ":help todo" for a list of known problems.
Note that a number of things that may be regarded as bugs by some, are
in fact caused by a too-faithful reproduction of Vi’s behaviour. And
if you think other things are bugs "because Vi does it differently",
you should take a closer look at the vi_diff.txt file (or type :help
vi_diff.txt when in Vim). Also have a look at the ’compatible’ and
’cpoptions’ options.
2006 Apr 11 VIM(1)
참고사이트 (출처)
http://www.incodom.kr/Linux/%EA%B8%B0%EB%B3%B8%EB%AA%85%EB%A0%B9%EC%96%B4
'Infra' 카테고리의 다른 글
vi 편집기 단축키 (0) | 2018.06.07 |
---|---|
[명령어] df 와 du 의 용량이 다른 이유 (1) | 2018.04.24 |
[RHEL] LVM 명령어 (0) | 2018.04.24 |
[RHEL x] kdump 활성화 및 crash 테스트 (0) | 2018.04.24 |
[RHEL7] 데몬 정리 (0) | 2018.04.23 |