Title: SQL Hacks: Tips & Tools for Digging into Your Data
Authors: Andrew Cumming & Gordon Russell
Pages: 386
Price: $29.99 USD
ISBN: 0-596-52799-3
Many of the recipes in SQL Hacks
(just released)
will improve the SQL you write day to day, and
many will give you the confidence to attempt much more involved tasks with SQL.
Other recipes will rarely if ever be needed, but make for a entertaining and
education reading in a similar way that "worse case survival scenario" books do --
SQL is pitted against the most difficult analysis tasks just as survival scenario
books pit humans against pavement and lions.
SQL Hacks fits well in the Hacks series, which raises the bar on advanced books by offering
large, eclectic sets of tricks for problems that an unambitious person (a non-hacker)
wouldn't ever push technology hard enough to run into.
Put another way, the questions answered in a good Hacks book are ones that would get a
"good question" comment rather than a an "RTFM!" response.
It does a good job continuing where O'Reilly's SQL Cookbook left off, which is always
difficult with two books written at slightly different times by different authors.
Still, it's harder to review a Hacks book than a Learning book as, with hacks, the
sky is the limit, and the reader will always find herself wishing for more.
To this end, I hope O'Reilly continues to publish newer editions of their various
Hacks books, drawing in more and more content in each edition, and identifying recipes
that might better serve in the Cookbook counterpart.
SQL Hacks skips most of the tutelage and shows you very specific ways for doing specific chores,
with more explanation of how to adapt it than theory behind it.
Most hacks have database specific information for the five databases the book tackles, and many hacks
are inherently different on each system, making them completely different solutions to the same problem.
Those five databases are Microsoft Access, Microsoft SQL Server, MySQL, Oracle, and PostgresSQL;
most of the ideas require work to adapt or are completely specific to the database system, so I wouldn't
suggest straying lightly from this supported set.
The authors did their homework, and SQL Hack's strengths are the depth, detail, and level
of knowledge with which each database system is covered, and the book's willingness to
get down and gritty.
There's never an impression that juicy details were omitted because the authors didn't
want to expend the effort to pick a colleague's brain or hunt down a factoid that never
got documented elsewhere.
Learning how to create indices on functions with multiple arguments in Postgres was
worth more than the "hack" it was a footnote in.
This dedication carriers over to screen shots showing how something is done in Microsoft
Access directly opposite Unix shell pipelines between grep, perl, and the SQL command shell.
Most books, including mine, are a bit awkward or vague on either Unix or Microsoft Windows,
but the author's and contributor's experience on this one expertly covered platforms specific database topics.
Besides just database systems and platforms, the authors challenged themselves to show how to securely
and efficiently use the database interfaces of a set of languages: C#, Java, Perl, Python, and PHP.
The polish shows, and you'll have absolute confidence that all of the tricks really are at your
fingertips, regardless of your choice of operating system, database system, or programming language.
It gets bonus points for mentioning non-obvious types of input, such as cookies,
that must be sanitized or sent through bound parameters, in its discussion of SQL injections.
In the security department, it looks at SQL injections from three points of view: early on
in the book, correct code is shown; later, SQL injections are shown from the point of view of
the attacker, with several pages of strategies and scenarios for formulating attacks; and then
from the point of view of the defender, who has to defang and avoid these scenarios -- extra
bonus points for this comprehensive treatment.
If you're looking for a quick buy/don't buy indication, then, by all means, buy it.
That is, assuming that it's not intended to be your first or only SQL book.
By it's own indication, it won't teach you the basics of database normalization, installation,
and so forth.
I would buy it as a second SQL book, though, after the fantastic _The Practical SQL Handbook_,
as it's written to a much higher standard than most books, and gets things right, such as security,
the intricacies of using a database to handle accounts, and transactions and shopping carts.
The cover text promises lots of advanced hackery, but that's vague.
"Pushing the limits of SQL"... "Solve puzzles using SQL"... "Manage users and audit the changes
they make to the database".
Here are the major sections:
SQL Fundamental; Joins, Unions, and Views; Text Handling; Date Handling;
Number Crunching; Online Applications; Organizing Data; Storing Small
Amounts of Data; Locking and Performance; Reporting; Users and Administration;
and Wider Access.
Wider Access requires some explanation. It deals with locking down the various database
systems to securely providing guest accounts, or, more generally, to limit damage in the
case of an SQL injection attack or similar compromise.
With the rise of Web applications, databases are now misunderstood to be little more than
persistent data storage with a search capability.
The real utility of the database is to store data in small pieces that can be easily combined
again in complex ways to extract meaning from the data stored in the database,
and to do so in a way that expresses relationships between pieces of data without
losing track of facts being stored.
This book won't help you with that -- you need just about that much knowledge already.
For a first book, I highly recommend The Practical SQL Handbook,
which teaches rational design first thing, moderately advanced reporting, problem solving, and most of all, is
absolutely the best book out there for getting eased into the mindset of structuring
and querying data in a relational database system.
With some well designed tables, SQL Hacks will show you quite a few tricks, some of them
quote involved, quite non-obvious, and quite clever, to extract meaning from the data.
You'll probably learn quite a few new types of reports you can do -- intersecting ranges
from different sets of data, outputting SVG pie charts, swapping rows and columns,
finding medians, computing running totals, and computing running functions such as compound
interest struck me as the most useful and got mental bookmarks.
I have two metrics for this book, and I have ratings for them according to those two metrics
at the end of the review.
The first metric is whether I'd buy it if I came across it in a book store, and that's a function
of whether I'd have exhausted what it had to offer after an hour or so of furious skimming
and intentionally picking out the best parts from the table of contents.
Very few books make this cut for me.
The other metric is whether the authors did at least what I imagine I would have done were
I writing it.
This test is also a difficult one but builds in a great deal of forgiveness as my ideas are
quite likely dumb ones.
The Good:
I totally dig the cut-and-paste ASCII query results.
The authors could have easily marked all of those up in DocBook and made it prettier but
also alien compared to what you'll see at the computer.
They're not ashamed of the SQL command shell, and they're not ashamed of SQL.
Many hacks have several examples, covering the problem with different constraints and
end goals in mind.
Multi-platform, and thoroughly so.
One moment, it's showing how to use XSLT tools from the command line on
Microsoft Windows, and on the next page, there's a Unix shell pipeline with wget, xsltproc, and
grep.
Perl one-liners abound, and there are screen shots from Windows applications with
instructions for navigating the menus and setting the needed options.
You won't feel shortchanged for running the "wrong" platform.
When a powerful, modern SQL extension, such as replace, gets ratified by the
standards committees, the authors let you know.
Sidebars are spread around sharing the good news that sometime you might not
have heard of before is portable.
At the same time, some features are just fluff, and you're warned off of operations intentionally
left out of the SQL92 standard.
Sometimes database systems have non-portable local extensions, such as MySQL's
full-text indexing and SQLServer's XML handling features, and lots of these
get motioned too, usually as variations on examples demonstrating the
feature as a short-cut or simplification.
The treatment of security is first rate.
The polish is top notch.
Writing a book is a huge undertaking, and the economics of book publishing gives publishers
little margin for advances.
A book that reads like it's third release but is actually in its first can only be the
product of an exceptional level of dedication by the authors.
One of the authors also runs a community site, http://sqlzoo.net,
itself an exercise is dedication.
Rarely, the authors do get tutorial-ish, but only a little, and I think it works:
"Choose the right join style for your relationship" deals the difference
between inner and outer joins, and whether records should be partially
populated with nulls or omitted entirely when relations between tables
can't be made for a record.
Another section shows how to convert between subqueries and outer joins,
and talks about when it's possible, and this serves as a sort of lesson
in demonstrating the the equivalencies between the two.
The Bad:
The "Hacks" format is similar to the "Cookbook" format.
Both offer small, randomly-accessible (flip to it when you need it) examples of how to accomplish various tasks.
In the traditional, MIT circles, a hack is piece of work that's either brilliant in its
simple elegance or else brilliant in its expediency and simple effectiveness, and as such, is worthy
of some esteem.
It's also work that's custom for a particular scenario and has limited domain -- in other words, it's a highly
specialized fix or improvement.
If a stock fix is applied systematically, that's mechanical, not clever.
By this definition, showing users how to invoke their SQL monitor, or showing users how to decide whether
to use an outer or inner join, are not hacks.
Few of the recipes triggered this peeve, and they were early in the book, but including those few muddles
the question of who the audience is, and lowers the standard for the Hacks series, endangering its
basic premise.
_SQL Hacks_ isn't alone in this sin; most of the Hacks books do it to some degree.
It was written by two professors at Napier University in Edinburgh, Scotland.
The style, grammar, and presentation are perfectly fine -- but only that.
It's not a bone dry college text book, but it was written with a dedication to professionalism
that can make a technical book tedious and will certainly keep it from becoming a classic.
The literary power of Brooks, Hoare, or Wall is conspicuously absent.
Authors of Hacks books are at liberty to tap the experiences of the best and brightest of the field,
and the best and brightest often have tricks just too strange, clever, or specialized to fit
into any ordinary sort of text.
I'd like to imagine that if I were charged with writing one of these, I'd have hundreds of
contributors (I'm not likable, but I am persistent).
Nothing against the contributors (two of them more than 20 years experience each), but why stop at three?
The Verdict:
I said I had two benchmarks: whether I'd be likely to walk out of a bookstore with it if I
had an hour alone with it to try to get my fill, and whether it touched on the subject that
I thought it should.
Before cracking the cover, I stopped to ponder what would really impress me, and what I'd like to see.
The Internal Functional Programming Competition had a puzzle solved by the contest winner using SQL.
I'd like to see similar combinatronics and optimization problems solved using SQL.
I'd like to see a good implementations of semi-infinite-strings, the text indexing data structure and algorithm that Google uses.
I've done a version of this, but my implementation leaves something wanting.
When reforming badly non-normalized databases, I've had to build a normalized database in parallel and populate it from
queries on the non-normalized one.
It would be interesting to hear how other people approach that problem, and what I can learn from them.
There are other jobs that I've tackled and managed despite never having been prepared for.
Renumber a display_order priority on records in response to the user adjusting or reassigning priorities.
Trees using self-joining tables is something more people should be exposed to, especially when presented
with non-normalized data.
There was no semi-infinite-string implementation, but the book showed how to build full-text
indexes the optimal way for each database, using built-in full-text indices and optional
add-on modules offering full-text indexing.
The renumbering example took the more general form of running-totals computations.
There were a few examples of self-joining data, and one tree example visualized the structure.
Normalizing data had tricks, including some with views, and it showed how to use Cartesian joins to
do combinatronics problems.
So, aside form one sort-of, the authors nailed my entire wish list.
That's amazing -- I've never had that happen before, actually.
The highest endorsement a book can earn from me (a cheapskate, who already has a good deal of
knowledge from working the industry for ten years) is getting bought on a random trip to the
bookstore where I hadn't been looking for or intending to buy anything, and paying full price on top of that.
Books that are surprising, riveting, and so packed with information that I couldn't possibly copy
all of the best parts down and exhaust it in an hour or two are the ones that get purchased in this manner.
I have _SQL Hacks_ in my hot little hands here at home, so this benchmark is now synthetic, but...
I'm somewhat undecided, and not sure whether I would or wouldn't walk out with it.
More likely, I'd just put it on my http://half.com wishlist and pick it up later, for a discount
(I'm a cheapskate, remember).
If you don't know how to do more than half of the things listed in the table of contents, most certainly buy it.
If you find yourself frequently working with SQL and constantly face new problems, buy it.
If you find yourself still learning SQL and wanting a variety of examples, buy it.
If you're shopping for a handful of good SQL books, buy it.
On a scale of stuff laying around the house, I give it 7 gold stars, half a box of binder
clips, some AA batteries, and a bottle of really good soy sauce.
Favorite Hacks:
The first hack is getting into the SQL monitor for your database.
I'd have to rule that anything documented in the documentation, near the begging, outside
of a footnote is, by definition, not a hack.
But points for giving a short VBA listing to suck raw SQL into Microsoft Access from a file.
Thought they don't officially support DB2, it's covered here.
Solve a crossword puzzle using SQL: including how to load the dictionary file into
the database.
Replacing tables with views during refactoring or cleanup.
Simplify complicated updates has a few examples, including creating a view
just for the purpose of updating from it, for cases where the update syntax
itself isn't adequate.
Generate Combinations shows how to do permutations with Cartesian products,
useful for solving optimization problems,
noting that it's "usually a mistake" to do this.
Efficient full text searching using features built in to the various database systems, or
available as add-ons.
Solve anagrams, by computing hashes for sets of letters from words from a dictionary and
matching against that.
Sort your email, which shows how to sort by function results, such as substrings extracted from
the domain name part of email addresses.
It also shows how to build indices on function results of functions with multiple
arguments in Postgres, something Postgres does not willingly do.
Uncover trends in your data, talks about grouping your data in various ways
before computing totals on it to bring out meaning in data that has natural
cycles (daily cycles, weekly cycles, and so forth), and to average parts
of a cycle together and plot the moving averages.
The modulo operator is used in SQL.
Date calculations to find date ranges, such as this month, last month, next month,
last year, and so forth.
Quarterly reports, using a case statement that breaks up the months into four sets.
Finding relative dates, such as "the second Tuesday of the month".
Multiply across a result set, such as when computing compound interesting.
Keep a running total, such as a bank balance, done using a subquery for one
system, a variable for another, and built-in features for the purpose for another.
Identify overlapping ranges, which compares two data sets and finds records
with ranges that overlap between the two sets.
Calculating the maximum of two fields, which, like many hacks, is question of
somewhat clever use of subqueries.
Disaggregating a count; references SQL Puzzles and Answers and
involves a table of integers and a good deal of trickery.
Getting values and subtotals in one shot, using a union.
Calculating a median, which uses a few subqueries.
Tally results into a chart, which makes ASCII bar charts out of data, right
in the SQL command shell.
It does CSS-HTML bar graphs too, with a slight modification.
Calculate the distance between GPS locations; simple trig, no real SQL trickery.
Reconciling invoices and remittances, which deals with finding perfect
matches between similar data sets, and then successively more approximate matches.
The next example is finding transposition errors, where data entry finger
race conditions caused digits to be entered out of order (swapped).
Some accounting knowledge is stuck in here: totals off by multiples
of 9 suggest transposition errors, and the magnitude of the unsettled
amount indicates which columns were swapped.
And then SQL is provided that picks out the potential erroneously numbers.
Computing progressive taxes (or bracketed taxes).
This uses a table in place of a switch or if-else.
Calculating rank, which is a grouped order position of each value, if it were sorted.
Copy Web pages into a table using XSLT.
Present data graphically using SVG (scalable vector graphics), which generates
a pie chart right from the database for viewing in your Web browser.
Tunneling into MySQL from Microsoft Access, using Plink.
Process Web server logs, parses logs and loads the database with a small
Perl example, with several examples of reporting on the data.
Store images in a database (or other large, binary objects), with examples of
fetching them out again and sending them down with the right MIME headers for
display.
Exploit an SQL injection vulnerability, with a tell-tale error screen, the
likely SQL on the server, and strategies for coming up with SQL to insert
into it. Several pages are given to strategies and techniques for bypassing
different kinds of tests and causing rows to be returned even when you
don't know the password.
Preventing a SQL injection attack, which talks about escaping in various
languages, and the need to escape data from cookies, hidden fields, and
other pieces of input that people sometimes forget about.
Keep track of infrequently changing values.
Display rows as columns (ooh, useful), using a self-join.
Importing someone else's data, which deals with normalizing and converting
structure.
Matching one-to-many records against other one-to-many records with an
example of a dating site match maker.
I could imagine this being adapted to single out zergers in an online game,
by finding characters who tend to use the same IP and be in the same locations.
Cope with unexpected redo, which talks through some failure scenarios, using
shopping carts and shopping sites as an example.
Mix file and database storage, which outlines the various problems facing storing
filenames in the database and practices for dealing with it.
Fill in missing values in a pivot table, using some heuristics and guesswork, automated.
Identify table updates uniquely, such as when batching updates and manually keeping sync.
Play six degrees of Kevin Bacon.
Building decision tables; "when you need a query to make decisions based on multiple
criteria you can hardcode the logic into a query, or you can use a decision table".
Traverse a simple tree, using self-joining data.
Generate a calendar from SQL -- useful!
Implementing application level accounts, including password hashing and strength testing.
Deploy applications that depend on databases (initialization, etc).
Find and stop long-running queries.
Don't run out of disc space (kill temp-space over-using queries).
Run SQL from a Web page, using one of many Web-SQL-shell applications.