File: rationale.html

package info (click to toggle)
soci 2.2.0-4
links: PTS, VCS
area: main
in suites: lenny
size: 3,400 kB
ctags: 1,969
sloc: cpp: 17,658; sh: 8,749; makefile: 379
file content (309 lines) | stat: -rw-r--r-- 14,546 bytes
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
  <meta content="text/html; charset=ISO-8859-1"  http-equiv="content-type" />
  <link rel="stylesheet" type="text/css" href="style.css" />
  <title>SOCI - rationale</title>
</head>

<body>
<p class="banner">SOCI - The C++ Database Access Library</p>

<h2>Rationale FAQ</h2>

<p>This part of the documentation is supposed to gather in a single place
the usual questions (and answers) about SOCI with regard to the design
decisions that shaped it.</p>

<h3>Q: Why "SOCI"?</h3>

<p>SOCI was initially developed in the environment where Oracle was the
main database technology in use. As a wrapper for the native OCI API
(Oracle Call Interface), the name "Simple Oracle Call Interface" was
quite obvious - until the 2.0 release, when the internal architecture
was largely redesigned to allow the use of <i>backends</i> that support other
database servers. We have kept the same name to indicate that Oracle is
the main supported technology in the sense
that the library includes only those features that were naturally
implemented in
Oracle. With the 2.1 release of the library, two new backends were
added (MySQL and SQLite3) and we decided to drop the original full name
so that new users looking for a library supporting any of these simpler
libraries are not discouraged by seeing "Oracle" somewhere in the name.
The other possible interpretation was "Syntax Oriented Call Interface",
which stresses the fact that SOCI was built to support the most natural
and easy interface for the user that is similar to the Embedded SQL
concept (see below). But on the other hand, SOCI also provides other
features (like object-relational mapping) and as a whole it is not just
"emulator" of the Embedded SQL. With all these considerations in mind,
SOCI is just "SOCI - The C++ Database Access Library".</p>

<p>Still, Oracle is considered to be the main driving server technology in
terms of the set of features that are supported by the library. This
also means that backeds for other servers might need to
work around some of the imposed idioms and protocols, but already
available and well-working PostgreSQL, MySQL and SQLite3 backends show
that it's actually not that
bad and the abstractions provided by the library are actually very
universal. Of
course, some of the features that were provided for Oracle might not be
supported for all other servers, but we think that it's better to have
one leading technology (where at least one group is fully happy)
instead of some "common denominator" for all databases (where <i>nobody</i> is happy).</p>

<h3>Q: Where the basic SOCI syntax comes from?</h3>

<p>The basic SOCI syntax was inspired by the Embedded SQL, which is part
of the SQL standard, supported by the major DB technologies and even
available as built-in part of the languages used in some DB-oriented
integrated
development environments. The term "Embedded SQL" is enough for Google
to spit millions of references - one of the typical examples is:</p>

<pre class="example">
{
    int a;
    /* ... */
    EXEC SQL SELECT salary INTO :a
             FROM Employee
             WHERE SSN=876543210;
    /* ... */
    printf("The salary is %d\n", a);
    /* ... */
}
</pre>

<p>The above is not a regular C (nor C++) code, of course. It's the mix of
C and SQL and there is a separate, specialized preprocessor needed to
convert it to something that the actual C (or C++) compiler will be
able to understand. This means that the compilation of the program
using embedded SQL is two-phase: preprocess the embedded SQL part and
compile the result. This two-phase development is quite troublesome,
especially when it comes to debugging. Yet, the advantage of it is that
the code expresses the programmer's intents in a very straightforward
way: read something from the database and put it into the local
variable. Just like that.</p>

<p>The SOCI library was born as an anwer to the following question: is it
possible to have the same expressive power without the disadvantages of
two-phase builds?</p>

<p>The following was chosen to be the basic SOCI syntax that can mirror
the above Embedded SQL example:</p>

<pre class="example">
int a;
sql &lt;&lt; "SELECT salary FROM Employee WHERE SSN=876543210", into(a);
</pre>

<p>(as you see, SOCI changes the order of elements a little bit, so that
the SQL query is separate and not mixed with other elements)</p>

<p>Apart from mimicking the Embedded SQL techniques in the regular, fully
standard C++ code, the above syntax has the following benefit: it is
<i>minimal</i> with respect to what has to
be said. Every single piece above is needed and expresses something
important, like:</p>
<ul>
  <li>which session should be used (the client can be connected to many
databases at the same time) - here, the <code>sql</code> object
encapsulates the session,</li>
  <li>what SQL query should be executed - here, it's the string
literal, but it could be also a <code>std::string</code> variable,</li>
  <li>where to put the result - here, the local variable <code>a</code>
will receive the result.</li>
</ul>

<p>Everything else is just a couple of operators that allow to treat the
whole as a single expression. It's rather difficult to remove anything
from this example.</p>

<p>The fact that the basic SOCI syntax is minimal (but without being
obscure at the same time, see below) means that the programmer does not
need to bother with unnecessary noise that some other database
libraries impose. We hope that after having written one line of code
like above by themselves, most programmers will react with something
like "how obvious!" instead of "how advanced!".</p>

<h3>Q: Why should I use SQL queries as strings in my program? I prefer the query to be generated or composed piece-by-piece by separate functions.</h3>

<p>First, you don't need to use SQL queries as string literals. In bigger
projects it is a common practice to store SQL queries externally (in a
file, or in a... database) and load them before use. This means that
they are not necessarily expected to appear in the program code, as
they do in our simple code examples and the advantage of separating
them from the source code of the main program is, among others, the
possibility to optimize and tune the SQL queries without recompilation
and relinking of the whole program.</p>

<p>What is the most important, though, is that SOCI does not try to mess
with the
text of the query (apart from very few cases), which means that the
database server will get exactly the same text of the query as is used
in the program. The advantage of this is that there is no new SQL-like
(or even SQL-unlike)
syntax that you would need to learn, and also that it's much easier to
convince a typical
DBA to help with SQL tuning or other specialized activities, if he is
given the material in the form that is not polluted with any foreign
abstractions.</p>

<h3>Q: Why not some stream-like interface, which is well-known to all C++ programmers?</h3>

<p>An example of the stream-like interface might be something like this
(this is imaginary syntax, not supported by SOCI):</p>

<pre class="example">
sql.exec("select a, b, c from some_table");

while (!sql.eof())
{
    int a, b, c;
    sql &gt;&gt; a &gt;&gt; b &gt;&gt; c;
    // ...
}
</pre>

<p>We think that the data stored in the relational database should be
treated as a set of relations - which is exactly what it is. This means
that what is read from the database as a result of some SQL query is a
<i>set of rows</i>. This set might be
ordered, but it is still a set of rows, not a uniform list of values.
This distinction might seem to be unnecessarily low-level and that the
uniform stream-like presentation of data is more preferable, but it's
actually the other way round - the set of rows is something more
structured (and that structure was <i>designed</i>
into the database) than the flat stream and
is therefore less prone to programming errors like miscounting the
number of values that is expected in each row.</p>

<p>Consider the following programming error:</p>

<pre class="example">
sql.exec("select a, b from some_table"); // only TWO columns

while (!sql.eof())
{
    int a, b, c;
    sql &gt;&gt; a &gt;&gt; b &gt;&gt; c; // this causes "row-tearing"
    // ...
}
</pre>

<p><i>"How to detect the end of each line in a file"</i> is a common beginner's question that
relates to the use of IOStreams - and this
common question clearly shows that for the record-oriented data the
stream is not an optimal abstraction. Of course, we don't claim that
IOStreams is bad - but we do insist that the record-oriented data is
better manipulated in a way that is also record-aware.</p>

<p>Having said that, we <i>have</i> provided some form of the stream-like
interface, but with the important limitation that the stream is always bound to the
single row, so that the row-tearing effect is not possible. In other words,
data returned from the database is still structured into rows, but each row can be
separately traversed like a stream. We hope that it provides a good balance between
convenience and code safety.</p>

<h3>Q: Why use indicators instead of some special value to discover that something is null?</h3>

<p>Some programmers are used to indicating the null value by using some
special (which means: "unlikely" to be ever used) value - for example,
to use the smallest integer value to indicate null integer. Or to use
empty string to indicate null string. And so on.</p>

<p>We think that it's completely wrong. Null (in the database sense) is an
information <i>about</i> the data. It describes the <i>state</i> of the data and if it's null,
then there's <i>no data at all</i>.
Nothing. Null. It does not make any sense to talk about some special
value if there is conceptually <i>no</i>
value - especially if we take into account that, for example, the
smallest integer value (or whatever else you choose as the "special"
value) might not be <i>that</i>
special in any given application or domain.</p>

<p>Thus, SOCI uses a separate indicators to describe the state of
exchanged data. It also has an additional benefit of allowing the
library to convey more than two states (null and not null). Indeed, the
SOCI library uses indicators also to report that the data was read, but
truncated (this applies to strings when reading to fixed-length
character arrays). Truncation is also an information about the data and
as such it's better to have it in addition to the data, not as part of
it.</p>

<h3>Q: Overloaded comma operator is just obfuscation, I don't like it.</h3>

<p>Well, consider the following:</p>

<p>"Send the query X to the server Y <i>and</i> put result into variable Z."</p>

<p>Above, the "and" plays a role of the comma. Even if overloading the
comma operator is not a very popular practice in C++, some libraries do
this, achieving terse and easy to learn syntax. We are pretty sure that
in SOCI the comma operator was overloaded with a good effect.</p>

<h3>Q: The <code>operator&lt;&lt;</code> provides a bad abstraction for the "input" statements.</h3>

<p>Indeed, the <code>operator&lt;&lt;</code> in the basic SOCI syntax
shows that something (the query) is sent somewhere (to the server).
Some people don't like this, especially when the "select" statements
are involved. If the high-level idea is to <i>read</i>
 data from somewhere, then <code>operator&lt;&lt;</code>
seems unintuitive to the die-hard IOStreams users. The fact is,
however, that the code containing SQL statement already indicates
that there is a client-server relationship with some other software
component (very likely remote). In such code it does not make any sense
to pretend that the communication is one-way only, because it's clear
that even the "select" statements need to be <i>sent</i> somewhere. This approach is
also more uniform and allows to cover other statements like "drop
table" or alike, where no data is expected to be exchanged at all (and
therefore the IOStreams analogies for data exchange have no sense at
all). No matter what is the kind of the SQL statement, it is <i>sent</i>
 to the server and this
"sending" justifies the choice of <code>operator&lt;&lt;</code>.</p>

<p>Using different operators (<code>operator&gt;&gt;</code> and <code>operator&lt;&lt;</code>)
as a way of distinguishing between different high-level ideas
(<i>reading</i> and <i>writing</i> from the data store,
respectively) does make sense on much higher level of abstraction,
where the SQL statement itself is already hidden - and we do encourage
programmers to use SOCI for implementing such high-level abstractions.
For this, the object-relational mapping facilities available in SOCI
might prove to be a valuable tool as well, as an effective bridge
between the low-level world of SQL statements and the high-level world
of user-defined abstract data types.</p>

<h3>Q: Why the Boost license?</h3>

<p>We decided to use the <a target="_blank"
 href="http://www.boost.org/LICENSE_1_0.txt">Boost license</a>, because
it's well recognized in the C++ community, allows us to keep our
minimum copyrights, and at the same time allows SOCI to be safely used
in commercial projects, without imposing concerns (or just plain
uncertainty) typical to other open source licenses, like GPL. We also
hope that by choosing the Boost license we have made the life easier
for both us and our users. It saves us from answering law-related
questions that were already answered on the <a target="_blank"
 href="http://www.boost.org/more/license_info.html">Boost license info
page</a> and it should also give more confidence to our users -
especially to those of them, who already accepted the conditions of the
Boost license (one license less to analyze).</p>

<p>Still, if for any reason the conditions of this license are not
acceptable, we encourage the users to contact us directly (see <a
 target="_blank" href="http://soci.sourceforge.net/people.html">links</a>
on the relevant SOCI page) to discuss any remaining concerns.</p>

<table class="foot-links" border="0" cellpadding="2" cellspacing="2">
  <tr>
    <td class="foot-link-left">
      <a href="backends.html">Previous (Backends reference)</a>
    </td>
    <td class="foot-link-right">
      <a href="backends/index.html">Next (Existing backends)</a>
    </td>
  </tr>
</table>

<p class="copyright">Copyright &copy; 2004-2006 Maciej Sobczak, Stephen Hutton</p>
</body>
</html>