Let us examine the parts of this definition in more detail.
Typical examples of information stored for some practical purpose are:
The three main examples given here are of STATIC, GROWING and DYNAMIC databases respectively.
Note that nothing was said in the definition about the quantity of information being held. Although many of the benefits associated with using a database are due to economies of scale, a small database may be very worthwhile (for instance to the secretary of the local sports club) if the information is to be processed frequently and in a repetitive manner.
This part of the definition goes without saying in most people's minds but it is worth dwelling on it for a minute. Because of the investment involved in setting up a database, the expectation must be that it will continue to be useful, over years rather than months. But the relationship with time varies from one type of information to another.
Information (often referred to in this context as data) has been processed by computer for over 30 years, using a variety of storage media. Some form of magnetic disc is likely to be used, since discs currently provide the most cost-effective way of holding large quantities of data while allowing fast access to any individual item. Other methods are obviously under development, notably optical storage - CD Rom - which as yet does not give enough scope for updating in most database applications.
Database handling techniques grew out of earlier and simpler file processing techniques. A file consists of an ordered collection of records; a database consists of two or more related files which we may wish to process together in various different ways. It will store not only the individual records containing the numbers or words needed for some application, but auxiliary information which will allow those records to be accessed more quickly, or which will link related records or data items together. A database designer may be required to choose how much and what sort of auxiliary information to store, using his knowledge of how the database will be used.
Computer storage and processing implies the use of software: in the current context a DATABASE MANAGEMENT SYSTEM (DBMS). The function of the DBMS is to store and retrieve information as required by applications programs or users sitting at terminals, using the facilities provided by the computer operating system. It is one of a number of software layers making computer facilities available to users with perhaps comparatively little technical expertise.
This includes describing:
Validation may include:
In an interactive data entry system, errors should be detected immediately - some can be prevented altogether by keyboard monitoring - and recovery and re-entry permitted.
Updating involves:
At the same time any back-ground data such as indexes or pointers from one record to another must be changed to maintain consistency. Updating may take place interactively, or by submission of a file of transaction records; handling these may require a program of some kind to be written, either in a conventional programming language (a host language, e.g. COBOL or C) or in a language supplied by the DBMS for constructing command files.
For this purpose most systems provide a QUERY LANGUAGE with which the characteristics of the required records may be specified. Query languages differ enormously in power and sophistication but a standard which is becoming increasingly common is based on the so-called RELATIONAL operations. These allow:
Arbitrary combinations of these operators on the files making up a database can answer a very large number of queries without requiring users to go into one record at a time processing.
Most systems provide facilities for describing how summary reports from the database are to be created and laid out on paper. These may include obtaining:
over particular CONTROL FIELDS. Also specification of PAGE and LINE LAYOUT, HEADINGS, PAGE-NUMBERING, and other narrative to make the report comprehensible.
This has several aspects:
An organisation uses a computer to store and process information because it hopes for speed, accuracy, efficiency, economy etc. beyond what could be achieved using clerical methods. The objectives of using a DBMS must in essence be the same although the justifications may be more indirect.
Early computer applications were based on existing clerical methods and stored information was partitioned in much the same way as manual files. But the computer's processing speed gave a potential for RELATING data from different sources to produce valuable manage-ment information, provided that some standardisation could be imposed over departmental boundaries. The idea emerged of the integrated database as a central resource. Data is captured as close as possible to its point of origin and transmitted to the database, then extracted by anyone within the organisation who requires it. However many provisos have become attached to this idea in practice, it still provides possibly the strongest motivation for the introduction of a DBMS in large organisations. The idea is that any piece of information is entered and stored just once, eliminating duplications of effort and the possibility of inconsistency between different departmental records.
Other advantages relate to the task of running a conventional Data Processing (DP) department. Organisational requirements change over time, and applications programs laboriously developed need to be periodically adjusted. A DBMS gives some protection against change by taking care of basic storage and retrieval functions in a standard way, leaving the applications developer to concentrate on specific organisational requirements. Changes in one of these areas need not have repercussions elsewhere. In general a DBMS is a substantial piece of software, the result of many man-years of effort. Because its development costs are spread over a number of purchasers it can probably provide more facilities than would be economic in a one-off product.
The points discussed above are probably most relevant to the larger organisation using a DBMS for its administrative functions - the environment in which the idea of databases first originated. In other contexts the convenience of a DBMS may be the primary consideration. The purchaser of a small business computer needs all the software to run it in package form, written so that the minimum of expertise is required to use it. The same applies to departments (e.g. Research & Development) with special needs which cannot be satisfied by a large centralised system. When comparing database management systems it is obvious that some are designed in the expectation that professional DP staff will be available to run them, while others are aimed at the total novice.
There are of course costs associated with adopting a DBMS. Actual monetary costs vary widely from, for instance, a large multi-user Oracle system to a small PC-based filing system. In the first case the charge will cover support, some training, extensive documentation and the provision of periodical upgrades to the software; in the second case the purchaser will be on his own with the manual. But there is also a tendency for the cost of software to reflect the cost of the hardware on which it is run!
Probably the main cost associated with acquiring a DBMS is due to the work involved in designing and implementing systems to use it. In order to provide a general and powerful set of facilities for its users any DBMS imposes restraints on the way information can be described and accessed, and demands familiarity with the DATA MODEL which it supports and the command language which it provides to define and manipulate data. Data models still in use are HIERARCHICAL (tree-structured), NETWORK and RELATIONAL (tabular). Of these the last is the current favourite, providing a good basis for high-level query languages and giving scope for the exploitation of special-purpose hardware in efficient large-scale data handling.
This course will concentrate on the RELATIONAL model.
The conventional SYSTEMS LIFE CYCLE consists of:
1.4.1. ANALYSIS 1.4.2. DESIGN 1.4.3. DEVELOPMENT 1.4.4. IMPLEMENTATION 1.4.5. MAINTENANCE
In practice these phases are not always sharply distinguished; for small projects it may not be necessary to go formally through every one. The move from one phase to the next is essentially a move from the general to the specific. At each stage, particularly where a DBMS is involved, we shall be concerned both with information and with processes to be performed using that information.
The outputs from this stage should be:
This stage should produce:
Specification of the database itself must now come down another level, to decisions about PHYSICAL DATA STORAGE in particular files on particular devices. For this a knowledge of the computer operating system, as well as the DBMS, is required. Conventional program development - coding, testing, debugging etc. may also be done. If a totally packaged system has been purchased this may not be necessary - it will simply be a matter of discovering how to use the command and query language already supplied to store and retrieve data, generate reports and other outputs. Even here an element of testing and debugging may be involved, since it is unlikely that the new user of a system will get it exactly right the first time. It is certainly inadvisable for this sort of experimentation to take place using a live database!
This puts the work of the previous three phases into everyday use. It involves such things as loading the database with live rather than test data, staff training, probably the introduction of new working practices. It is not unusual to have an old and a new system running side by side for a while so that some back-up is available if the new system fails unexpectedly.
Systems once implemented generally require further work done on them as time goes by, either to correct original design faults or to accommodate changes in user requirements or operating constraints. One of the objectives of using a DBMS is to reduce the impact of such changes - for example the data can be physically re-arranged without affecting the logic of the programs which use it. Some DBMSs provide utility programs to re-organise the data when either its physical or logical design must be altered.