This is a substantially re-written fork of Stefan Milton Bache's squr package.
The squr (pronounced "skewer") package provides a set of tools
for managing structured query language (SQL) files in R projects.
It promotes the separation of logic and query scripts and eases the process
of reading, parameterizing, and composing SQL queries.
The primary changes in this fork are: the removal of several features that are
unnecessary for my own purposes (transactions, INSERT statements, ignore blocks,
composing SQL snippets with +) although I may add them back at a later date.
Additionally, I've made significant internal changes to handle the (specific
to me) situation of needing to run queries with very long IN clauses with
more than 1000 elements. This includes new parameter specification methods
that use the helper functions IN() and LIKE(). Finally there is a minimal
system for documenting the associated SQL files.
squr uses DBI::sqlInterpolate and DBI::dbQuoteString to parse values, but
also allows for arbitrary string manipulation of SQL queries via sq_replace.
Interpolating values is not as secure as using each database's
parameterized query functionality.
Because of this, squr is aimed at users for whom SQL injection attacks are a fairly low concern. This may include situations like,
- you interact with a local database on your machine with limited or no internet connectivity
- SQL parameter input values are always generated by you, not unknown remote user input
- you write code against a database for which you only have
SELECTprivileges - the code you write with squr is not being run by anyone other than you (e.g. not in web apps like Shiny)
The farther away from these sorts of situations you are, the less likely it is that squr is a secure choice. However, whenever the values originate from user input (e.g. in a Shiny/web application, or web services, etc), approprite precautions should still be taken (type checking, whitelisting, etc.)
OWASP has some good guidelines on how to prevent SQL injection in your applications.
In the following example the sql/query.sql file is read,
parameterized, and sent. There is no database connectivivty in squr,
this is left fully flexible and out of scope.
# Simple example of a query function wrapper. This part varies depending
# on database, drivers, etc, but needs only setup once.
rodbc <- function(query){
ch <- RODBC::odbcDriverConnect("<connectionstring>")
on.exit(RODBC::odbcClose(ch))
RODBC::sqlQuery(ch, query)
}
# Example of reading file, setting a parameter value, and sending the query,
# using the `sq_*` ("skew") functions.
result <-
sq_file("sql/query") %>%
sq_set(Param = value) %>%
sq_send(.with = rodbc)The corresponding query.sql file could look something like:
SELECT *
FROM TheTable
WHERE Param = @ParamNote that many arguments in squr are prefixed with a dot; this is to
avoid name clash with values in ... arguments.
Arguments are not actually bound to parameters in sq_set, they are merely
recorded in the object. Parameters are bound just before being sent to the
database in sq_send.
Note:
- the
.sqlextension can be omitted, - when called from within a package, the path is relative to the
installation folder.
There is also sq_text, which is a way to use inline SQL. Both sq_file and
sq_text produces S3 objects which are lists with an additional class
sq to enable a few methods (e.g. print).
Suppose we have the following SQL:
SELECT *
FROM table
WHERE column IN @columnand we'd like to be able to run this query with various collections of values
inserted for @column. In squr we can do the following:
result <- sq_text(.query = "select * from table where column in @column") %>%
sq_set(column = IN(1:10)) %>%
sq_send(.with = rodbc)Wrapping the vector of values to be bound with IN() ensures that they are
properly formatted and wrapped in parens. This will work for only a single value,
converting IN(1) into column IN (1). Also, if the vector of values passed
is longer than 1000 squr will split it into chunks of size at most 1000 and
then send multiple queries for each chunk, rbinding the results.
Splitting IN clauses like this only works for one IN clause per SQL query.
If our query looked like this:
SELECT *
FROM table
WHERE column LIKE @columnwe can pass in values for @column like this:
result <- sq_text(.query = "select * from table where column like @column") %>%
sq_set(column = LIKE("foo",side = "l",wildcard = "%")) %>%
sq_send(.with = rodbc)The resulting bound query would look like this:
SELECT *
FROM table
WHERE column LIKE '%foo'There is also a function sq_replace that is a wrapper for gsub that allows
you to edit SQL text, but only prior to setting parameter values with sq_set.
Since values are appropriately quoted when they are bound, the default
replacements will not work for dynamically specifying column and table names.
However, you can use sq_value explicitly (this is the function used
internally to prepare a value for SQL):
-- The SQL file
SELECT [Date]
, [CustomerId]
, [CustomerName]
, @Feature
FROM Customers
WHERE Date BETWEEN @DateFrom AND @DateTo# R
result <-
sq_file("customers") %>%
sq_set(DateFrom = Sys.Date() - 10,
DateTo = Sys.Date(),
Feature = sq_value("Turnover", quote = "[")) %>%
sq_send(.with = rodbc)squr allows for limited documentation within your .sql files, using a very simple syntax along the lines of roxygen2. All commented out lines at the beginning of a .sql file are assumed to be documentation. An example illustrating the format is:
-- Test Query
--
-- Pulls `column1` and `column1` from `my_table` filtering on a particular value for
-- `condition`.
--
-- @param foo a value for condition, e.g. "bar"
-- @param bar a value for another condition, this time with a
-- longer description across multiple lines.
-- @functions my_fun, your_fun, lots_of_fun,
-- more_fun, final_fun
-- @scripts script1.R, script2.R
select
column1,
column2
from
my_table
where
condition = @foo
another_condition = @bar
The first line, Test Query is the query title. The next section forms the
description, with a blank commented line above and below.
Only three tags are recognized:
@parama single word denoting the name of the paramter, a space and then a description of the parameter@functionsa comma separated list of functions that the SQL query is used in, e.g. for when squr is used within another package@scriptsa comma separated list of .R scripts that the SQL query is used in, e.g. for when squr is used in an RStudio project.
Other tags are ignored.
A similar (but different) project for Clojure (with ports for some other languages) by @krisajenkins is Yesql.
