lf38, SoftwareDevelopment: Introduction to SQL -- Installation of PostgreSQL

News Archives Map

by Manuel Soriano
<manu(at)europa3.com>

Translated to English by:
Miguel A Sepulveda <sepulveda(at)linuxfocus.org>

Content:

Introduction

Introduction to SQL -- Installation of PostgreSQL

Abstract:

This short course consists of three parts, in the first part we explain a few generalities about SQL using a public domain database server called postgreSQL. In the second part we will study in more detail the SQL commands. Finally the th ird part will show us some of the advanced options of the SQL commands, functions specific of PostgreSQL that might be of interest for our projects and we will finally review a small C program putting it all together.

_________________ _________________ _________________

Introduction

During this quick introduction I will only refer to databases (DB). There are other organisations of data, but writing about them would be outside the scope of our course.

Until recently, the access to data items was accomplished through entities that were inter-related by well-defined links of the database scheme. This type of access had advantages, mainly speed of access, but a big disadvantage: we could only access data through an existing link, for example:

     country -> states -> counties

but never :

     country -> counties

where "->" is the link.

If we wish to establish that second relationship we would have to redefine the scheme and compile it again....

In fact, in a hierarchical DB, the relationship among the various entities is static and can only be modified after alteration of the DB scheme and recompilation of the latter.

The basic idea behind relation databases is precisely to link data during the query instance, without the need for a static link, but instead using an identifier that permits to link one register with another.

What I just wrote probably needs an Aspirin :)

Relational database managers do not required static links to allow us go down the hierarchy of entities, instead they use a unique code that identify these entities while establishing a temporary relation as a result to a query.

The identification is nothing but a code. Ex: My phone number is not:

1234567

but :

34 6 1234567

Clearly my phone number is identified by the country code (34), state code (6) and the proper device number (1234567).

In the set of countries, the code 34 (Spain) is unique.
In the set of states, the code 34-6 (Spain/Valencia) is unique.
In the set of devices, the code 34-6-1234567 (Spain/Valencia/my phone number) is unique.

Let me set the foundations for the first example that illustrates what I just said.

     All counties have a code, belong to a state and a country.
     All states have a code and belong to a country.
     All countries have a code.

To find all the counties in a state we relate the county with the state through the country and county codes; to find all the counties in a country we relate the county to the country by the country code. These relationships are temporary and only exist during the time of my query.

This is a bit arid and hard to understand but with the first few examples I hope to make clearer the concept of code and belonging.

When I send the first query to the DB manager it returns me all the related data items. But what data am I really receiving? The union of country and county items, for every county I am going to receive the related country items.

During my first query a new nameless entity is suddenly created, it contains a replica of countries and counties. This new entity, once again, disappears at the end of my query.

Before we used to name a set of data a "file". This are made of registers and each "register" has a "field". Well, in a relational database, a "file" is known as a table, a table contains rows and each row has columns, this is just a small cosmetic change. ;-)

It is good to mention at this point that some hierarchical DB managers introduce SQL as an access language, but this is only anecdote. SQL language is almost exclusively a property of relational managers.

To illustrate the use of SQL we will use the relational managers PostgreSQL. Although it is not fully compliant with the rules of SQL, it is sufficiently close for our purposes, it is also a very good manager for more heavy duty tasks.

Let me explain only briefly the installation process, given the goal of this article is SQL. First download the sources from www.postgresql.org, as well as any patches available. Extract the sources (tar zxvf) to a directory, cd postgresql-6.3

cd src
./configure --prefix=/the/desired/path
make all >& make.log &
tail -f make.log

export PATH=$PATH:/the/desired/path/pgsql/bin
export MANPATH=$MANPATH:/the/desired/path/pgsql/man
export PGLIB=/the/desired/path/pgsql/lib
export PGDATA=/the/desired/path/pgsql/data

initdb
createdb test
psql test
Welcome to the POSTGRESQL interactive sql monitor:
  Please read the file COPYRIGHT for copyright terms of POSTGRESQL

   type \? for help on slash commands
   type \q to quit
   type \g or terminate with semicolon to execute query
 You are currently connected to the database: postgres

test=>

This is the prompt for postgres, now we can start executing commands.

mytest=>create table mytest (field1 varchar(10));
CREATE

mytest=>insert into mytest values ('hello');
INSERT number 1

mytest=>commit work;
NOTICE:EndTransactionBlock and not inprogress/abort state
END

mytest=>select * from mytest;
field1
------
hello
(1 row)

mytest=>drop table mytest;
DROP

mytest=>Ctrl-d

We have already exited the SQL console.

In case of having trouble compiling and installing Postgres95 correctly please refer to the INSTALL file in the root directory for the distribution.

Let me make another side comment; a relational database server is generally made of the following parts:

Data access layer
SQL processing layer
SQL parser layer
Communications layer

As clients we connect to the 4th layer, we send SQL commands which are then passed to the parser layer. This translates the commands and, in the absence of errors, sends the requests to the second layer. All the processing and querying operations take place at this level in collaboration with the data access layer obviously: data is collected, errors are transmitted to the client via the communications layer. The SQL processing layer establishes a dialogue with the client while managing the correct transfer of data and controlling the transactions and interrupts.

First Steps

Next I will illustrate with an example what has been described so far, let us build three tables (or files):

File: countries.sql
create table countries (cod_country integer, name varchar(30));
insert into countries values (1, 'country 1');
insert into countries values (2, 'country 2');
insert into countries values (3, 'country 3');
commit work;

File: states.sql
create table states (cod_State int,
            cod_country int,
            nam_State varchar(30));
insert into states values (1, 1, 'State 1, Country 1');
insert into states values (2, 1, 'State 2, Country 1');
insert into states values (1, 2, 'State 1, Country 2');
insert into states values (2, 2, 'State 2, Country 2');
insert into states values (1, 3, 'State 1, Country 3');
insert into states values (2, 3, 'State 2, Country 3');
commit work;

File: counties.sql
create table counties (cod_country int,
            cod_state int,
            cod_county int,
            nam_county varchar(60));
insert into counties values (1, 1, 1, 'County 1, State 1, Country 1');
insert into counties values (2, 1, 1, 'County 2, State 1, Country 1');
insert into counties values (3, 1, 1, 'County 3, State 1, Country 1');
insert into counties values (1, 2, 1, 'County 1, State 2, Country 1');
insert into counties values (2, 2, 1, 'County 2, State 2, Country 1');
insert into counties values (3, 2, 1, 'County 3, State 2, Country 1');
insert into counties values (1, 3, 1, 'County 1, State 3, Country 1');
insert into counties values (2, 3, 1, 'County 2, State 3, Country 1');
insert into counties values (3, 3, 1, 'County 3, State 3, Country 1');
insert into counties values (1, 1, 2, 'County 1, State 1, Country 2');
insert into counties values (2, 1, 2, 'County 2, State 1, Country 2');
insert into counties values (3, 1, 2, 'County 3, State 1, Country 2');
insert into counties values (1, 2, 2, 'County 1, State 2, Country 2');
insert into counties values (2, 2, 2, 'County 2, State 2, Country 2');
insert into counties values (3, 2, 2, 'County 3, State 2, Country 2');
insert into counties values (1, 3, 2, 'County 1, State 3, Country 2');
insert into counties values (2, 3, 2, 'County 2, State 3, Country 2');
insert into counties values (3, 3, 2, 'County 3, State 3, Country 2');
insert into counties values (1, 1, 3, 'County 1, State 1, Country 3');
insert into counties values (2, 1, 3, 'County 2, State 1, Country 3');
insert into counties values (3, 1, 3, 'County 3, State 1, Country 3');
insert into counties values (1, 2, 3, 'County 1, State 2, Country 3');
insert into counties values (2, 2, 3, 'County 2, State 2, Country 3');
insert into counties values (3, 2, 3, 'County 3, State 2, Country 3');
insert into counties values (1, 3, 3, 'County 1, State 3, Country 3');
insert into counties values (2, 3, 3, 'County 2, State 3, Country 3');
insert into counties values (3, 3, 3, 'County 3, State 3, Country 3');
commit work;

A file with SQL commands can be executed from pgsql like this:

\i file_name

We could also insert the commands using a simple cut & paste.

Let us see next what counties are available:

manu=> select * from counties;
cod_country|cod_state|cod_county|nam_county
-----------+---------+----------+----------------------------
          1|        1|         1|County 1, State 1, Country 1
          2|        1|         1|County 2, State 1, Country 1
          3|        1|         1|County 3, State 1, Country 1
          1|        2|         1|County 1, State 2, Country 1
          2|        2|         1|County 2, State 2, Country 1
          3|        2|         1|County 3, State 2, Country 1
          1|        3|         1|County 1, State 3, Country 1
          2|        3|         1|County 2, State 3, Country 1
          3|        3|         1|County 3, State 3, Country 1
          1|        1|         2|County 1, State 1, Country 2
          2|        1|         2|County 2, State 1, Country 2
          3|        1|         2|County 3, State 1, Country 2
          1|        2|         2|County 1, State 2, Country 2
          2|        2|         2|County 2, State 2, Country 2
          3|        2|         2|County 3, State 2, Country 2
          1|        3|         2|County 1, State 3, Country 2
          2|        3|         2|County 2, State 3, Country 2
          3|        3|         2|County 3, State 3, Country 2
          1|        1|         3|County 1, State 1, Country 3
          2|        1|         3|County 2, State 1, Country 3
          3|        1|         3|County 3, State 1, Country 3
          1|        2|         3|County 1, State 2, Country 3
          2|        2|         3|County 2, State 2, Country 3
          3|        2|         3|County 3, State 2, Country 3
          1|        3|         3|County 1, State 3, Country 3
          2|        3|         3|County 2, State 3, Country 3
          3|        3|         3|County 3, State 3, Country 3
(27 rows)
manu=>

There are 27 rows and pgsql is now waiting the next command, try this one:

manu=> select * from countries, states;
cod_country|name     |cod_state|cod_country|nam_state
-----------+---------+---------+-----------+------------------
          1|country 1|        1|          1|State 1, Country 1
          2|country 2|        1|          1|State 1, Country 1
          3|country 3|        1|          1|State 1, Country 1
          1|country 1|        2|          1|State 2, Country 1
          2|country 2|        2|          1|State 2, Country 1
          3|country 3|        2|          1|State 2, Country 1
          1|country 1|        1|          2|State 1, Country 2
          2|country 2|        1|          2|State 1, Country 2
          3|country 3|        1|          2|State 1, Country 2
          1|country 1|        2|          2|State 2, Country 2
          2|country 2|        2|          2|State 2, Country 2
          3|country 3|        2|          2|State 2, Country 2
          1|country 1|        1|          3|State 1, Country 3
          2|country 2|        1|          3|State 1, Country 3
          3|country 3|        1|          3|State 1, Country 3
          1|country 1|        2|          3|State 2, Country 3
          2|country 2|        2|          3|State 2, Country 3
          3|country 3|        2|          3|State 2, Country 3
(18 rows)

18 rows ??? We inserted 3 countries and 6 states, all identify a single country. How is it possible we get 18 rows?

The last command has performed a union of two tables, we have related the table of countries with the table of counties, since we have not specify any union exclusion rule, pgsql returned ALL possible rows of countries related with ALL rows of states, i.e. 3 for countries times 6 for states for a total of 18. This result is obviously illogic and useless, better if we could have done the following:

manu=> select * from countries, states
manu-> where countries.cod_country = states.cod_country;
cod_country|name     |cod_state|cod_country|nam_state
-----------+---------+---------+-----------+------------------
          1|country 1|        1|          1|State 1, Country 1
          1|country 1|        2|          1|State 2, Country 1
          2|country 2|        1|          2|State 1, Country 2
          2|country 2|        2|          2|State 2, Country 2
          3|country 3|        1|          3|State 1, Country 3
          3|country 3|        2|          3|State 2, Country 3
(6 rows)

Well, this begins to appear a bit more reasonable, Six rows, Correct?

Yes, there are six counties and each county is in a country. It is reasonable to get a number of rows identical to the number of counties because country is a qualifier of counties. We just related the table of countries with the table of counties via the country code. Remember that countries have a code and counties have the code of the country they belong to.

Why countries.cod_country = states.cod_country ?

The country code in the table of countries is cod_country and in the table of counties too, therefore:

cod_country = cod_country

is illogical, the interpreter will never now which of the two to use and it would return us an error:

select * from countries, states
        where cod_country = cod_country;

ERROR:  Column cod_country is ambiguous

Next, we can use aliases for columns:

manu=> select * from countries a, states b
manu-> where a.cod_country = b.cod_country;
cod_country|name     |cod_state|cod_country|nam_state
-----------+---------+---------+-----------+------------------
          1|country 1|        1|          1|State 1, Country 1
          1|country 1|        2|          1|State 2, Country 1
          2|country 2|        1|          2|State 1, Country 2
          2|country 2|        2|          2|State 2, Country 2
          3|country 3|        1|          3|State 1, Country 3
          3|country 3|        2|          3|State 2, Country 3
(6 rows)

What does the manager return?: cod_country, name, cod_state, cod_country y nam_state.

Since we query "select * from countries, states", where the * is a wild card that stands for EVERYTHING, we obtained the two columns for countries and the three for counties. Now we would like to be more specific:

manu=> select a.cod_country, cod_state, name, nam_state
manu-> from countries a, states b
manu-> where a.cod_country = b.cod_country;
cod_country|cod_state|name     |nam_state
-----------+---------+---------+------------------
          1|        1|country 1|State 1, Country 1
          1|        2|country 1|State 2, Country 1
          2|        1|country 2|State 1, Country 2
          2|        2|country 2|State 2, Country 2
          3|        1|country 3|State 1, Country 3
          3|        2|country 3|State 2, Country 3
(6 rows)

In the last command we explicitly asked for the country code, the state code and the name of the country and state. Notice that some column names are qualified (a.cod_country) while others are not (nam_state), this is because cod_country is repeated in both tables while nam_state exists only in states. Unique column names do not need extra qualifiers.

Let us make things more complicated:

manu=> select a.cod_country, cod_state, name, nam_state
manu-> from countries a, states b
manu-> where a.cod_country = b.cod_country
manu-> and a.cod_country = 3;
cod_country|cod_state|name     |nam_state
-----------+---------+---------+------------------
          3|        1|country 3|State 1, Country 3
          3|        2|country 3|State 2, Country 3
(2 rows)

This time we limited the search to country number 3 only.

Functions

Here is an example of using the row counting function count():

select count(*) from states;

count
-----
   27
(1 row)

It returns the number of rows contained in the table of counties, next:

manu=> select cod_country, count(*) from states
manu-> group by cod_country;
cod_country|count
-----------+-----
          1|    2
          2|    2
          3|    2
(3 rows)

It returns the number of rows with IDENTICAL country code, this is the reason for using the cod_country.

An even better example:

manu=> select name, count(*) from countries a, states b
manu-> where a.cod_country = b.cod_country
manu-> group by name;
name     |count
---------+-----
country 1|    2
country 2|    2
country 3|    2
(3 rows)

We still obtained the same three rows but this time the returned information is more clear.

Well until now we have only given an introduction, just warming up :-)

Concepts Review

So far we have seen a few very basic concepts of SQL. The most relevant thing here is the concept itself of SQL. We do not work with concrete data anymore but with data entities. A data entity is an abstract concept of databases. Simplifying it could be understood as "ONLY RETURN PART OF ALL WHAT IS AVAILABLE"

We have seen several commands:

`CREATE TABLE`	This command creates a table with its columns.
`DROP TABLE`	Erases a table.
`SELECT`	This command is the foundation of SQL, allows us to create a temporal table containing the necessary data items only. SELECT can take as parameters functions or complex statements, as well as `sub_selects`: select count(*) from states where cod_country in (select cod_country from countries); count ----- 27 (1 row)
`BEGIN WORK`	This is another fundamental command. It tells the DB manager to commit ALL the modifications given since BEGIN WORK. In our particular DB manager a BEGIN WORK marks the initialisation of a transaction, in other managers the beginning of a transaction is mark by the first command that alters something in the database. In postgreSQL all commands that alter data will operate directly unless there was a previous BEGIN WORK. NOTE: commands that modify the scheme of the database execute a COMMIT WORK, therefore if a transaction is opened and any such commands is executed our transaction will be closed immediately and it will be impossible to launch a ROLLBACK WORK. While a user has an open transaction he can declare the access type to his data by other users: Modified data Original Data previous to the transaction Block data access
`COMMIT WORK`	Closes a transaction leaving the committing the modifications introduced. The command ROLLBACK WORK returns the data to their state previous the current transaction.

The concept of transaction is very important given that it let us return to the previous state in case of an error. Let us try this operation, first a "rollback work" to close any previous transactions:

manu=> select * from countries;
cod_country|name
-----------+---------
          1|country 1
          2|country 2
          3|country 3
(3 rows)

There are three rows,

begin work;

Begins a transaction

insert into countries values (5, 'Country Not True');

We inserted a row, let us next verify that all the rows are there

manu=> select * from countries;
cod_country|name
-----------+----------------
          1|country 1
          2|country 2
          3|country 3
          5|Country Not True
(4 rows)

All the rows are there. Next

rollback work;

this abandons the transaction.

manu=> select * from countries;
cod_country|name
-----------+---------
          1|country 1
          2|country 2
          3|country 3
(3 rows)

After checking the number of rows we see it went back to the original 3 rows.

INSERT We have seen this command already. It inserts data in a table.

CREATE TABLE

Another important command, the creation of a table and its columns, let us see now the type of data that can be handle:

`char(range):`	Alfa-numeric data of fixed length 30 bytes.
`varchar(rango):`	Alfa-numeric data of variable length up to 30 bytes.
`int2:`	Numeric binary data of 2 bytes: 2-15 hasta 215
`int4:`	Numeric binary data of 4 bytes: 2-31 - 231
`money:`	Fixed point data number, ex: money(6,3) is a number of six digits, 3 of which are decimal (3 integers and 3 decimals).
`time:`	Temporal data containing hours, minutes, seconds and hundreds of second, HH:MM:SS:CCC
`date:`	Date data containing yer, month and day, AAAA/MM/DD
`timestamp:`	Date and time data as AAAA/MM/DD:HH:MM:SS:CCC
`float(n):`	Real data of single precision.
`float3:`	Real data of double precision.

The definitions of the type of data are specific for every kind of SQL manager, nevertheless there is an SQL standard (the last one is ANSI/92 or SQL/3) that defines a few types with its characteristics. This course we will only see few types specific of PostgreSQL.

DELETE deletes rows of a table

UPDATE modifies the columns of a row in a table.

Summary

Despite our somewhat erratic style, we have introduced SQL and install a relational database.

SQL let us build an abstraction layer to our data and permits us manage them according to our needs.

From what we have seen so far, one could ask: How do I use SQL within an application?

The answer will come one step at a time, in our third article we will review a short C application using SQL.

Translation information:

es --> -- : Manuel Soriano <manu(at)europa3.com>

es --> en: Miguel A Sepulveda <sepulveda(at)linuxfocus.org>

2002-11-02, generated by lfparser version 2.34