tomb

the crypto undertaker
git clone git://parazyd.org/tomb.git
Log | Files | Refs | README | LICENSE

shocco (16974B)


      1 #!/bin/sh
      2 # **shocco** is a quick-and-dirty, literate-programming-style documentation
      3 # generator written for and in __POSIX shell__. It borrows liberally from
      4 # [Docco][do], the original Q&D literate-programming-style doc generator.
      5 #
      6 # `shocco(1)` reads shell scripts and produces annotated source documentation
      7 # in HTML format. Comments are formatted with Markdown and presented
      8 # alongside syntax highlighted code so as to give an annotation effect. This
      9 # page is the result of running `shocco` against [its own source file][sh].
     10 #
     11 # shocco is built with `make(1)` and installs under `/usr/local` by default:
     12 #
     13 #     git clone git://github.com/rtomayko/shocco.git
     14 #     cd shocco
     15 #     make
     16 #     sudo make install
     17 #     # or just copy 'shocco' wherever you need it
     18 #
     19 # Once installed, the `shocco` program can be used to generate documentation
     20 # for a shell script:
     21 #
     22 #     shocco shocco.sh
     23 #
     24 # The generated HTML is written to `stdout`.
     25 #
     26 # [do]: http://jashkenas.github.com/docco/
     27 # [sh]: https://github.com/rtomayko/shocco/blob/master/shocco.sh#commit
     28 
     29 # Usage and Prerequisites
     30 # -----------------------
     31 
     32 # The most important line in any shell program.
     33 set -e
     34 
     35 # There's a lot of different ways to do usage messages in shell scripts.
     36 # This is my favorite: you write the usage message in a comment --
     37 # typically right after the shebang line -- *BUT*, use a special comment prefix
     38 # like `#/` so that its easy to pull these lines out.
     39 #
     40 # This also illustrates one of shocco's corner features. Only comment lines
     41 # padded with a space are considered documentation. A `#` followed by any
     42 # other character is considered code.
     43 #
     44 #/ Usage: shocco [-t <title>] [<source>]
     45 #/ Create literate-programming-style documentation for shell scripts.
     46 #/
     47 #/ The shocco program reads a shell script from <source> and writes
     48 #/ generated documentation in HTML format to stdout. When <source> is
     49 #/ '-' or not specified, shocco reads from stdin.
     50 
     51 # This is the second part of the usage message technique: `grep` yourself
     52 # for the usage message comment prefix and then cut off the first few
     53 # characters so that everything lines up.
     54 expr -- "$*" : ".*--help" >/dev/null && {
     55     grep '^#/' <"$0" | cut -c4-
     56     exit 0
     57 }
     58 
     59 # A custom title may be specified with the `-t` option. We use the filename
     60 # as the title if none is given.
     61 test "$1" = '-t' && {
     62     title="$2"
     63     shift;shift
     64 }
     65 
     66 # Next argument should be the `<source>` file. Grab it, and use its basename
     67 # as the title if none was given with the `-t` option.
     68 file="$1"
     69 : ${title:=$(basename "$file")}
     70 
     71 # These are replaced with the full paths to real utilities by the
     72 # configure/make system.
     73 MARKDOWN='/usr/bin/markdown_py'
     74 PYGMENTIZE='/usr/bin/pygmentize'
     75 
     76 # On GNU systems, csplit doesn't elide empty files by default:
     77 CSPLITARGS=$( (csplit --version 2>/dev/null | grep -i gnu >/dev/null) && echo "--elide-empty-files" || true )
     78 
     79 # We're going to need a `markdown` command to run comments through. This can
     80 # be [Gruber's `Markdown.pl`][md] (included in the shocco distribution) or
     81 # Discount's super fast `markdown(1)` in C. Try to figure out if either are
     82 # available and then bail if we can't find anything.
     83 #
     84 # [md]: http://daringfireball.net/projects/markdown/
     85 # [ds]: http://www.pell.portland.or.us/~orc/Code/discount/
     86 command -v "$MARKDOWN" >/dev/null || {
     87     if command -v Markdown.pl >/dev/null
     88     then alias markdown='Markdown.pl'
     89     elif test -f "$(dirname $0)/Markdown.pl"
     90     then alias markdown="perl $(dirname $0)/Markdown.pl"
     91     else echo "$(basename $0): markdown command not found." 1>&2
     92          exit 1
     93     fi
     94 }
     95 
     96 # Check that [Pygments][py] is installed for syntax highlighting.
     97 #
     98 # This is a fairly hefty prerequisite. Eventually, I'd like to fallback
     99 # on a simple non-highlighting preformatter when Pygments isn't available. For
    100 # now, just bail out if we can't find the `pygmentize` program.
    101 #
    102 # [py]: http://pygments.org/
    103 command -v "$PYGMENTIZE" >/dev/null || {
    104     echo "$(basename $0): pygmentize command not found." 1>&2
    105     exit 1
    106 }
    107 
    108 # Work and Cleanup
    109 # ----------------
    110 
    111 # Make sure we have a `TMPDIR` set. The `:=` parameter expansion assigns
    112 # the value if `TMPDIR` is unset or null.
    113 : ${TMPDIR:=/tmp}
    114 
    115 # Create a temporary directory for doing work. Use `mktemp(1)` if
    116 # available; but, since `mktemp(1)` is not POSIX specified, fallback on naive
    117 # (and insecure) temp dir generation using the program's basename and pid.
    118 : ${WORK:=$(
    119       if command -v mktemp 1>/dev/null 2>&1
    120       then
    121           mktemp -d "$TMPDIR/$(basename $0).XXXXXXXXXX"
    122       else
    123           dir="$TMPDIR/$(basename $0).$$"
    124           mkdir "$dir"
    125           echo "$dir"
    126       fi
    127   )}
    128 
    129 # We want to be absolutely sure we're not going to do something stupid like
    130 # use `.` or `/` as a work dir. Better safe than sorry.
    131 test -z "$WORK" -o "$WORK" = '/' && {
    132     echo "$(basename $0): could not create a temp work dir."
    133     exit 1
    134 }
    135 
    136 # We're about to create a ton of shit under our `$WORK` directory. Register
    137 # an `EXIT` trap that cleans everything up. This guarantees we don't leave
    138 # anything hanging around unless we're killed with a `SIGKILL`.
    139 trap "rm -rf $WORK" 0
    140 
    141 # Preformatting
    142 # -------------
    143 #
    144 # Start out by applying some light preformatting to the `<source>` file to
    145 # make the code and doc formatting phases a bit easier. The result of this
    146 # pipeline is written to a temp file under the `$WORK` directory so we can
    147 # take a few passes over it.
    148 
    149 # Get a pipeline going with the `<source>` data. We write a single blank
    150 # line at the end of the file to make sure we have an equal number of code/comment
    151 # pairs.
    152 
    153 # Folding.el support: turn {{{ folds }}} into titles -jrml
    154 (cat "$file" \
    155     | sed -e 's/^# {{{/# #/' -e 's/^# }}}.*/# --------------/' \
    156     | awk '
    157 /function.*\(\) {$/ { print "# ### " $2; print $0; next }
    158 /\(\) {$/ { print "# ### " $1; print $0; next }
    159 {print $0}' \
    160     && printf "\n\n# \n\n")         |
    161 
    162 # We want the shebang line and any code preceding the first comment to
    163 # appear as the first code block. This inverts the normal flow of things.
    164 # Usually, we have comment text followed by code; in this case, we have
    165 # code followed by comment text.
    166 #
    167 # Read the first code and docs headers and flip them so the first docs block
    168 # comes before the first code block.
    169 (
    170     lineno=0
    171     codebuf=;codehead=
    172     docsbuf=;docshead=
    173     while read -r line
    174     do
    175         # Issue a warning if the first line of the script is not a shebang
    176         # line. This can screw things up and wreck our attempt at
    177         # flip-flopping the two headings.
    178         lineno=$(( $lineno + 1 ))
    179         test $lineno = 1 && ! expr "$line" : "#!.*" >/dev/null &&
    180         echo "$(basename $0): $(file):1 [warn] shebang! line missing." 1>&2
    181 
    182         # Accumulate comment lines into `$docsbuf` and code lines into
    183         # `$codebuf`. Only lines matching `/#(?: |$)/` are considered doc
    184         # lines.
    185         if expr "$line" : '# ' >/dev/null || test "$line" = "#"
    186         then docsbuf="$docsbuf$line
    187 "
    188         else codebuf="$codebuf$line
    189 "
    190         fi
    191 
    192         # If we have stuff in both `$docsbuf` and `$codebuf`, it means
    193         # we're at some kind of boundary. If `$codehead` isn't set, we're at
    194         # the first comment/doc line, so store the buffer to `$codehead` and
    195         # keep going. If `$codehead` *is* set, we've crossed into another code
    196         # block and are ready to output both blocks and then straight pipe
    197         # everything by `exec`'ing `cat`.
    198         if test -n "$docsbuf" -a -n "$codebuf"
    199         then
    200             if test -n "$codehead"
    201             then docshead="$docsbuf"
    202                  docsbuf=""
    203                  printf "%s" "$docshead"
    204                  printf "%s" "$codehead"
    205                  echo "$line"
    206                  exec cat
    207             else codehead="$codebuf"
    208                  codebuf=
    209             fi
    210         fi
    211     done
    212 
    213     # We made it to the end of the file without a single comment line, or
    214     # there was only a single comment block ending the file. Output our
    215     # docsbuf or a fake comment and then the codebuf or codehead.
    216     echo "${docsbuf:-#}"
    217     echo "${codebuf:-"$codehead"}"
    218 )                                            |
    219 
    220 # Remove comment leader text from all comment lines. Then prefix all
    221 # comment lines with "DOCS" and interpreted / code lines with "CODE".
    222 # The stream text might look like this after moving through the `sed`
    223 # filters:
    224 #
    225 #     CODE #!/bin/sh
    226 #     CODE #/ Usage: shocco <file>
    227 #     DOCS Docco for and in POSIX shell.
    228 #     CODE
    229 #     CODE PATH="/bin:/usr/bin"
    230 #     CODE
    231 #     DOCS Start by numbering all lines in the input file...
    232 #     ...
    233 #
    234 # Once we pass through `sed`, save this off in our work directory so
    235 # we can take a few passes over it.
    236 sed -n '
    237     s/^/:/
    238     s/^:[ 	]\{0,\}# /DOCS /p
    239     s/^:[ 	]\{0,\}#$/DOCS /p
    240     s/^:/CODE /p
    241 ' > "$WORK/raw"
    242 
    243 # Now that we've read and formatted our input file for further parsing,
    244 # change into the work directory. The program will finish up in there.
    245 cd "$WORK"
    246 
    247 # First Pass: Comment Formatting
    248 # ------------------------------
    249 
    250 # Start a pipeline going on our preformatted input.
    251 # Replace all CODE lines with entirely blank lines. We're not interested
    252 # in code right now, other than knowing where comments end and code begins
    253 # and code begins and comments end.
    254 sed 's/^CODE.*//' < raw                      |
    255 
    256 # Now squeeze multiple blank lines into a single blank line.
    257 #
    258 # __TODO:__ `cat -s` is not POSIX and doesn't squeeze lines on BSD. Use
    259 # the sed line squeezing code mentioned in the POSIX `cat(1)` manual page
    260 # instead.
    261 cat -s                                       |
    262 
    263 # At this point in the pipeline, our stream text looks something like this:
    264 #
    265 #     DOCS Now that we've read and formatted ...
    266 #     DOCS change into the work directory. The rest ...
    267 #     DOCS in there.
    268 #
    269 #     DOCS First Pass: Comment Formatting
    270 #     DOCS ------------------------------
    271 #
    272 # Blank lines represent code segments. We want to replace all blank lines
    273 # with a dividing marker and remove the "DOCS" prefix from docs lines.
    274 sed '
    275     s/^$/##### DIVIDER/
    276     s/^DOCS //'                              |
    277 
    278 # The current stream text is suitable for input to `markdown(1)`. It takes
    279 # our doc text with embedded `DIVIDER`s and outputs HTML.
    280 $MARKDOWN                                    |
    281 
    282 # Now this where shit starts to get a little crazy. We use `csplit(1)` to
    283 # split the HTML into a bunch of individual files. The files are named
    284 # as `docs0000`, `docs0001`, `docs0002`, ... Each file includes a single
    285 # doc *section*. These files will sit here while we take a similar pass over
    286 # the source code.
    287 (
    288     csplit -sk                               \
    289            $CSPLITARGS                       \
    290            -f docs                           \
    291            -n 4                              \
    292            - '/<h5>DIVIDER<\/h5>/' '{9999}'  \
    293            2>/dev/null                      ||
    294     true
    295 )
    296 
    297 
    298 # Second Pass: Code Formatting
    299 # ----------------------------
    300 #
    301 # This is exactly like the first pass but we're focusing on code instead of
    302 # comments. We use the same basic technique to separate the two and isolate
    303 # the code blocks.
    304 
    305 # Get another pipeline going on our performatted input file.
    306 # Replace DOCS lines with blank lines.
    307 sed 's/^DOCS.*//' < raw                     |
    308 
    309 # Squeeze multiple blank lines into a single blank line.
    310 cat -s                                      |
    311 
    312 # Replace blank lines with a `DIVIDER` marker and remove prefix
    313 # from `CODE` lines.
    314 sed '
    315     s/^$/# DIVIDER/
    316     s/^CODE //'                             |
    317 
    318 # Now pass the code through `pygmentize` for syntax highlighting. We tell it
    319 # the the input is `sh` and that we want HTML output.
    320 $PYGMENTIZE -l sh -f html -O encoding=utf8  |
    321 
    322 # Post filter the pygments output to remove partial `<pre>` blocks. We add
    323 # these back in at each section when we build the output document.
    324 sed '
    325     s/<div class="highlight"><pre>//
    326     s/^<\/pre><\/div>//'                    |
    327 
    328 # Again with the `csplit(1)`. Each code section is written to a separate
    329 # file, this time with a `codeXXX` prefix. There should be the same number
    330 # of `codeXXX` files as there are `docsXXX` files.
    331 (
    332     DIVIDER='/<span class="c"># DIVIDER</span>/'
    333     csplit -sk                   \
    334            $CSPLITARGS           \
    335            -f code               \
    336            -n 4 -                \
    337            "$DIVIDER" '{9999}'   \
    338            2>/dev/null ||
    339     true
    340 )
    341 
    342 # At this point, we have separate files for each docs section and separate
    343 # files for each code section.
    344 
    345 # HTML Template
    346 # -------------
    347 
    348 # Create a function for apply the standard [Docco][do] HTML layout, using
    349 # [jashkenas][ja]'s gorgeous CSS for styles. Wrapping the layout in a function
    350 # lets us apply it elsewhere simply by piping in a body.
    351 #
    352 # [ja]: http://github.com/jashkenas/
    353 # [do]: http://jashkenas.github.com/docco/
    354 layout () {
    355     cat <<HTML
    356 <!DOCTYPE html>
    357 <html>
    358 <head>
    359     <meta http-equiv='content-type' content='text/html;charset=utf-8'>
    360     <title>$1</title>
    361     <link rel=stylesheet href="docco.css">
    362     <link rel=stylesheet href="style.css">
    363     <link rel=stylesheet href="public/stylesheets/normalize.css">
    364 </head>
    365 <body>
    366 <div id=container>
    367     <div id=background></div>
    368     <table cellspacing=10 cellpadding=10>
    369     <thead>
    370       <tr>
    371         <th class=docs><h1>$1</h1></th>
    372         <th class=code></th>
    373       </tr>
    374     </thead>
    375     <tbody>
    376         <tr><td class='docs'>$(cat)</td><td class='code'></td></tr>
    377     </tbody>
    378     </table>
    379 </div>
    380 </body>
    381 </html>
    382 HTML
    383 }
    384 
    385 # Recombining
    386 # -----------
    387 
    388 # Alright, we have separate files for each docs section and separate
    389 # files for each code section. We've defined a function to wrap the
    390 # results in the standard layout. All that's left to do now is put
    391 # everything back together.
    392 
    393 # Before starting the pipeline, decide the order in which to present the
    394 # files.  If `code0000` is empty, it should appear first so the remaining
    395 # files are presented `docs0000`, `code0001`, `docs0001`, and so on.  If
    396 # `code0000` is not empty, `docs0000` should appear first so the files
    397 # are presented `docs0000`, `code0000`, `docs0001`, `code0001` and so on.
    398 #
    399 # Ultimately, this means that if `code0000` is empty, the `-r` option
    400 # should not be provided with the final `-k` option group to `sort`(1) in
    401 # the pipeline below.
    402 if stat -c"%s" /dev/null >/dev/null 2>/dev/null ; then
    403     # GNU stat
    404     [ "$(stat -c"%s" "code0000")" = 0 ] && sortopt="" || sortopt="r"
    405 else
    406     # BSD stat
    407     [ "$(stat -f"%z" "code0000")" = 0 ] && sortopt="" || sortopt="r"
    408 fi
    409 
    410 # Start the pipeline with a simple list of split out temp filename. One file
    411 # per line.
    412 ls -1 docs[0-9]* code[0-9]* 2>/dev/null      |
    413 
    414 # Now sort the list of files by the *number* first and then by the type. The
    415 # list will look something like this when `sort(1)` is done with it:
    416 #
    417 #     docs0000
    418 #     code0000
    419 #     docs0001
    420 #     code0001
    421 #     docs0002
    422 #     code0002
    423 #     ...
    424 #
    425 sort -n -k"1.5" -k"1.1$sortopt"              |
    426 
    427 # And if we pass those files to `cat(1)` in that order, it concatenates them
    428 # in exactly the way we need. `xargs(1)` reads from `stdin` and passes each
    429 # line of input as a separate argument to the program given.
    430 #
    431 # We could also have written this as:
    432 #
    433 #     cat $(ls -1 docs* code* | sort -n -k1.5 -k1.1r)
    434 #
    435 # I like to keep things to a simple flat pipeline when possible, hence the
    436 # `xargs` approach.
    437 xargs cat                                    |
    438 
    439 
    440 # Run a quick substitution on the embedded dividers to turn them into table
    441 # rows and cells. This also wraps each code block in a `<div class=highlight>`
    442 # so that the CSS kicks in properly.
    443 {
    444     DOCSDIVIDER='<h5>DIVIDER</h5>'
    445     DOCSREPLACE='</pre></div></td></tr><tr><td class=docs>'
    446     CODEDIVIDER='<span class="c"># DIVIDER</span>'
    447     CODEREPLACE='</td><td class=code><div class=highlight><pre>'
    448     sed "
    449         s@${DOCSDIVIDER}@${DOCSREPLACE}@
    450         s@${CODEDIVIDER}@${CODEREPLACE}@
    451     "
    452 }                                            |
    453 
    454 # Pipe our recombined HTML into the layout and let it write the result to
    455 # `stdout`.
    456 layout "$title"
    457 
    458 # More
    459 # ----
    460 #
    461 # **shocco** is the third tool in a growing family of quick-and-dirty,
    462 # literate-programming-style documentation generators:
    463 #
    464 #   * [Docco][do] - The original. Written in CoffeeScript and generates
    465 #     documentation for CoffeeScript, JavaScript, and Ruby.
    466 #   * [Rocco][ro] - A port of Docco to Ruby.
    467 #
    468 # If you like this sort of thing, you may also find interesting Knuth's
    469 # massive body of work on literate programming:
    470 #
    471 #   * [Knuth: Literate Programming][kn]
    472 #   * [Literate Programming on Wikipedia][wi]
    473 #
    474 # [ro]: http://rtomayko.github.com/rocco/
    475 # [do]: http://jashkenas.github.com/docco/
    476 # [kn]: http://www-cs-faculty.stanford.edu/~knuth/lp.html
    477 # [wi]: http://en.wikipedia.org/wiki/Literate_programming
    478 
    479 # Copyright (C) [Ryan Tomayko <tomayko.com/about>](http://tomayko.com/about)<br>
    480 # This is Free Software distributed under the MIT license.
    481 :