Beyond CASE technologies: semantic standardization of metadata

L. Reingold

Along with computer hardware, systems for automated development of software for data processing are progressing.

For example, CASE technologies are offering us increasingly advanced methodological and programming tools for modeling the surrounding world and developing required software.

Is it possible to go even further by the road of formalizing the interaction process between the man and the world?

Can there be created a certain methodology making it possible in the long run to get rid of applied programming as it is understood today? As it seems, this assumption is realistic.

In what follows we are considering certain approaches that enable advances in this area.

Objects and indicators

The surrounding world can be represented as a set of objects.

At present each analysis of any such object is unique. The interface for handling information on the object is also unique (except for general conventions on the interface in the operating system and peculiarities of the programming tools being used).

However, the diversity of the object environment surrounding us has its limits. The growing capabilities of standard computer hardware become sufficient for adequate structured description of the man's environment. The existing methodology can be extended by introducing the concept of indicator which is appropriate in the present context. The terms traditionally used in modeling technologies such as attribute, domain, property, method and object require a more precise definition in our context.

Each object and its function can be described by a set of indicators.

Here, indicator is defined as a relevant characteristic of an object. Indicator is an arbitrary information element reflecting the structure or behavior of the object in question.

An indicator can be a number, a formalized/unformalized text, a graphics/audio element, a program module or a result of certain program module's operation. Thus, an indicator is any data element described in a comprehensive unified way or an operation on the data of a certain object. In representing an object as a set of indicators, differences between attributes and operations are of no crucial importance. Operations are also indicators represented in a specific way.

An indicator has traditional features of an attribute such as format, output mask, type of value, default value, etc.

In addition, it has extra features: unified code, measurement unit, frequency of occurrence/alteration of values, (semantically) parent indicator. A child indicator inherits semantic features of the parent that are defined in a more precise way, where necessary. For example: size>length>length of a car.

There should always be a certain set of indicators unequivocally identifying the object in the relevant respect among other similar objects.

Each indicator is subject to unification: at the time of the first occurrence it is stored in a repository, granted a permanent code and then used in all cases where its semantics matches the required one (within the area of the repository's accessibility).

Each indicator for each selected copy of an object assumes a certain value at each point in time.

Each value stored in the system has a certain moment (date, time) of emergence. If an indicator is changing in time, one can speak about a history of values for the indicator. The history of values is one of an indicator's characteristics.

To obtain a set of indicators characterizing a certain object in a relevant context, one of existing methodologies of structural or object modeling can be used, the object being treated as a certain structured set of indicators (structured, e.g. as a hierarchy tree).

In other words, indicators are data unified in semantics, format, measurement unit and data accessing conditions whose history can be stored in accordance with the level of the repository's accessibility.

By way of extending generally accepted CASE notations, model description tools can be incorporated into them, allowing to represent the object in question as a set of indicators. The ultimate level of the model's detailed elaboration should be an integral system of indicators constituting the model.

If a change in the idea of the object necessitates reconstruction of a certain subset of the tree of indicators, this reconstruction is in principle formalizable and can be carried out automatically and involve both the structure and the values of the indicators.

To enter values of indicators into the system and use these values for various purposes, a standard interface for this type of objects can be developed.

Typology of indicators

A set of indicators is specific to each type of object. However, their typological groups can be singled out. A typology of indicators is needed for information search, ordering and analysis.

Some basic types of indicators are discussed below.

Individual and group indicators

A distinction can be made between specific indicators reflecting a specific property that have their own values and those uniting several other indicators into a group. The latter will be referred to as group indicators. Group indicators can contain both individual and other group indicators.

An individual indicator can be converted into a group indicator, if a more detailed description of the object is needed. Values of the indicator can be converted, where it makes sense, into appropriate meaningful values in the new group. For example, an indicator having a certain text as a value can be converted into a group of indicators whereby the text is split into appropriate fragments in accordance with a specified algorithm.

Identification indicators and information indicators

Identification indicators are used to distinguish between copies of objects under consideration. They should be sufficient for unequivocal identification of objects in the subject area in question.

Identification indicators may include the unified code of the indicator, date/time of entering a value of the indicator, reference to the source of the indicator's value.

The indicator identification system should be global for the type of objects under consideration and, if necessary, possess tools for converting the existing local identification systems into global ones.

Static and dynamic (behavior) indicators

Indicators are divided into those reflecting the structure of the object and those characterizing its change. The latter ones can either assume a specific value or be represented by algorithms whose implementation in the form of software is a value of the appropriate parameter.

Multimedia indicators

Values of indicators can be arbitrary binary data, including those containing audio and video information. In the case of more detailed structuring of the value they can be represented as a group containing more detailed values.

Templates of indicators

The same indicators and their groups can and should be used in description of various objects. It is important to make sure that indicators of the same type can be used in various types and copies of objects.

Therefore a vital issue is construction of generally accessible repositories of indicators and their stable groups ensuring compatibility of data even if this was not originally intended.

If a new indicator or a group of indicators comes into existence, it is included into the repository. If the new indicator is derived by transformation from a local system, the repository should also contain the algorithm of its transformation into the local algorithm and the reverse algorithm.

Templates of indicators form a basis for construction of compatible corpora of data. Compatibility can be of two basic types: compatibility at the coding level and semantic compatibility. Whereas the problem of coding compatibility is technically clear (it is a quite soluble, even if technically intricate, problem), semantic compatibilization requires additional research.

History of indicators

A value of an indicator always emerges as a characteristic of a certain specific object at a given point in time. At least two moments should be stored in the indicator value database:

1. Point in time when this value characterizes the object

2. Point in time when this value was stored

Another important characteristic of an indicator's value that proves useful in many cases is information on the source of the value.

It is clear that for a number of objects the same value for an indicator can emerge at different times and come from different sources.

It is assumed in this context that the source of data is an object like the others whose structure and values of indicators are described in the indicator value database. A reference to the data source is an integral feature of the indicator's value.

In other words, under the present concept each value of an indicator in the indicator value repository can have several versions originating from various times and various sources.

For example, the strength of an army of a specific country in a specific war in a specific year can be given by various conflicting parties, various organizations within the countries and vary in the course of time. All these values (including obsolete ones that were later updated) have a right to existence.

Distinctive features of indicators

The main distinctions of indicators from properties and methods of objects common in object programming are as follows:

- unified description accessible to a specified community (in the general case, to all network users);

- history of values;

- history of the structure;

- if necessary, the history of indicators can be alienated from the source of information;

- the unified interfaces stored together with their descriptions can be used to handle the indicators and their groups for entering or modifying values of the indicators in copies of objects as well as to analyze the data.

Thus, a value of an indicator is not just a certain number or string, but a value containing required information on its own semantics.

Standard models of objects

The surrounding reality includes various objects, some of which are of top practical interest.

Each object can be described by a set of unified indicators. Structural models of objects can be constructed that generalize the properties of similar concrete objects. Two basic approaches to constructing such standard models can be singled out: first, formalized descriptions of objects can be produced on the basis of certain methodological assumptions, second, this can be achieved by reengineering existing developments (in terms of both data structures and software modules ensuring required functionality). The final stage of reengineering involves conversion of available data into a new view allowing to operate according to the new methodology.

Construction of standard models requires consistent descriptions and appropriate classification of relevant objects. The description should contain data allowing the object to be identified in the necessary context as well as structured descriptions representing various aspects of the object. For example, general description of the object, manufacturing of the object, various applications of the objects, marketing of the object.

The system should contain tools for identification of the particular type of object and reconciliation of its various models in view of possible intersection of data from various models. Intersecting data of one model should be automatically used by another model.

Availability of standard description of an object allows the data of the object to be handled by standard tools. For example, standard control tools can be used to handle information on the object: enter and modify data, search and analyze information. That is, along with data descriptions large-module fragments of applications for handling the data requiring no complicated programming for their adjustment and use. Such fragments are simple for perception by the user because they relate to concrete data.

It is worth noting that the present approach imposes no limits on alternative descriptions representing various views of the object. However, in my opinion, the existing CASE technologies have insufficient capabilities for adequate reflection of possible alternativeness of the reality. This requires further refinement of the tools to achieve adequate reflection of alternativeness in the structure of the model and establishment of correspondences between alternative and semantically intersecting descriptions of the objects.

Since the object undergoes changes in the course of time, both values of indicators describing the object and the structure of description are changing.

Changes in the object are characterized by time series of values of indicators forming the description of the object.

Changes in the structure of the objects' description alter the structure of the time series. A change in the structure of description does not entail loss of values obtained in the previous structure, because the history of values is supplemented by the history of changes in the structure of objects.

An endless and ultimately detailed history is theoretically possible, although it will be formed in reality within conceptually prescribed accuracy.

A turning point in the history of humankind can be anticipated when a detailed history of all the essential in our reality will come into being.

This is a history of states and events in the lives of individual persons, buildings, outstanding mechanisms and machines, any natural objects. A history of artificial and virtual objects, events in social and political life, formalized, permitting any analysis.

Any object (natural, biological, an artifact, informational) that has ever existed on Earth and attracted somebody's attention will stay imprinted in the structured computer memory.

The more interesting is the object to the people, the more varied description it will have. It is natural to expect the thoroughness of the description to increase with time.

A description of each object should include the primary key to distinguish it from similar ones irrespective of viewpoint and information source. There should also be a mechanism for splitting values of indicators and their merging together in the case of errors or discrepancies in their identification.

Repositories for accumulation and dissemination of metadata are needed that contain:

- a catalogue of descriptions of indicators;

- a catalogue of standard objects described by unified indicators;

- indicator value databases for copies of objects (an optional feature for the metadata repository which, however, proves natural in many cases).

Repositories and databases of indicator values can be classified by generality and attribution of objects described in them as follows:

- supranational;

- national;

- private notary;

- corporate;

- private.

There are no barriers for repositories of various levels to interact for the purpose of constructing a compatible infrastructure of metadata.

Availability of a repository is a strong incentive to semantic and functional standardization of descriptions of the man's environment. A repository also makes sense as a means of alienating information from its source. The data source will be unable to arbitrarily change the metadata description structure and data obtained using the metadata which can entail loss of information and compatibility with previous data versions. This is important in some applications.

The epoch of "universal memory" will generate new problems: philosophical, moral, legal, psychological, economical. Another problem is that of authenticity, lack of distortions in data and their descriptions. It has a technical, mathematical, organizational and social aspects. These are beyond the scope of the present paper.

Mechanisms are needed for verification of information, differentiation of access for maximum protection of interests of individuals and their associations. Again, this is easier to accomplish and monitor within a unified, conceptually consistent environment permitting implementation of various democratic control mechanisms.

Structuring of reality

Since the bulk of available information has been accumulated by mankind in the form of coherent text and coherent text is convenient for perception, advent of technologies for automated structuring of textual descriptions of objects and, inversely, generation of textual descriptions from structured systems can be anticipated.

Broadly speaking, complexity of essential aspects of the surrounding world is finite and therefore complexity of the system of indicators and their values required for their description is also finite.

The following general procedure for structuring relevant aspects of reality can be proposed:

1. Universal semantic classification and structured description of objects of reality.

2. Identification of a set of practically relevant aspects (contexts) of each object of current interest.

3. Construction of a model of the object on the basis of descriptions of standard objects and additional indicators characterizing the object.

4. Development of standard methods for interaction with the object of a given type in a given context, for example, generating a copy of the object and filling it with data, search and analysis of information on similar objects.

Under this approach, one basic function of the state in the long run is deposition of data on objects within its competence and regulating access to the data for individuals and organizations in accordance with access differentiation as required by regulatory documents and other factors. Many various seemingly managerial and control functions can be reduced to this activity.

In many cases managerial inputs are a result of simple transformation of original data. Such management can be implemented in the data repository in an automatic mode.

Handling the objects

The above can be illustrated by the following diagram. Description of the object's structure and behavior forms a basis for the database of indicator values describing each copy of the object. Each value of an indicator has its history in terms of both structure and values and is open to new information, because it is always associated with a certain source.

Выноска-облако: No applied programming required any more

Выноска-облако: Repository of unified descriptions accessible via the Internet (Intranet)

This approach permits accumulation of consistent structured data on any subject.

If the system complexity is increased, the data will not be lost. They will be automatically transformed into a new format or will remain in the history of states of the object.

Treatment of structured data and technology for their processing as a global resource requires a new approach to designing the interface, because the interface is not static and changes in accordance with end user's requests. The RAD interface and especially command interface appear to be too complicated for this purpose.

An interface for handling standardized descriptions of indicators is needed which is accessible to end user and possesses appropriate functionality.

Many essential elements of description which are however hardly understandable to end user can be concealed from him. Default values can come from the indicator depository.

Advent of a specific network repository-based RAD technology essentially understandable to end user can be anticipated.

To solve this problem at end user level, the best option may be manipulation of 3-D graphic models of objects looking more like a game rather than interface construction or adjustment by end user for his specific needs.

It is worth noting that the proposed approach does not rule out possible use of various alternative interfaces for handling the same types of objects.

DBMS and the proposed methodology

In the author's opinion, the proposed approach has an impact on the DBMS methodology.

A qualitative leap in development of DBMSs can be expected in the near future. This is inevitable at the time when 20 GB of direct access memory are sold at $200 and each end user has a super computer of the recent past on his desk. It appears natural in these conditions that computer systems increase their capabilities in reflecting the surrounding world.

What is happening around us proceeds in a certain historical sequence. Our views of our environment also have their history, and are updated and modified in accordance with current requirements and ideas.

At the same time, methodologies used in designing databases and applications for handling databases do not directly take into consideration the time factor.

Thus, the object methodology is conceptually incomplete without the history of objects. It is aimed at solving momentary problems without reflecting the genesis of the object.

It is often unknown how the data are going to be used in the future. Therefore a unified mechanism is needed associating the data at the semantic level, the classifier code level and in the time aspect.

These problems can be tackled within the framework of the proposed methodology.

The methodology permits development of a DBMS having an inbuilt support of data structure/data value histories. The data vocabulary and the database itself in modern DBMSs only contain data on current state of the database structure and data copy values offering no tools for maintenance of their history.

The applications requiring to some extent support of data history including those developed with the author's participation turn out to be too clumsy and complicated in terms of maintenance at both system and application levels.

Let us call the DBMSs deprived of the above disadvantages 'absolute memorization' DBMSs (AM DBMSs). Such a DBMS will memorize the history of its own structure and all previous values of data. It may even have mechanisms ensuring invariability of previous values by appropriate mechanisms incorporated into the core of the database as well as by distributed data storage. There should naturally be tools to control the formation of the inbuilt history. The time of emergence of a value in the history of an indicator is the moment of completion of a transaction storing the next value in the database. However, other conditions for emergence of history of indicator values are also possible.

Apparently, such a DBMS can be implemented using available capabilities. The logging tools of the DBMS can be used for construction of history on the basis of various events taking place when the database is changed, but they are a ready-made solution for the problem in question. Only its special implementation can be fundamentally more efficient. Among other things, it should contain:

- various modes for accessing the history;

- possibility of assigning groups of objects whose history is to be maintained and the maintenance mode (including complete history of all objects in the database);

- extended concept of the data vocabulary offering support of indicator structure/repository history.

- mechanism of differentiating access to the history;

- additional tools for effective storage of new types of data;

- extended inquiry language at necessary levels (traditional inquiry language SQL and superstructures over the language that are usually incorporated into the core and DBMS application development tools are not enough);

- possibility of interaction with external repositories of indicators;

- inbuilt tools to prevent changes in the data history.

If the old data history change prevention mode in this DBMS is on, the old data cannot be changed without leaving any traces. In an extreme case the copy of the database whose history has been changed can be rendered inoperative. That is, a change in history can entail resetting to the latest authentic state. To implement this feature, the public key encoding technology and other ways of protection developed specifically for electronic commerce systems can be used.

Fundamentally new mechanisms can be developed that are at present in no way implemented in current DBMSs. For example, transactions can be supported when the structure of the object is changed.

Given special implementation, DBMSs whose data are undeletable can prove very effective. Since the data in this case can be represented in a more compact way, they need no reorganization in the course of operation.

The proposed approach will entail changes in implementations of client-server technologies that are now generally used in various versions.

Somewhat exaggerating, it can be assumed that in constructing applications the proposed approach will make it possible to switch from component programming at the level of form elements to the design level where an element is a group of forms for handling structured, standardized data aggregates.

The following figure shows one of possible ways to carry out this change. A more detailed consideration of the issue is beyond the scope of the present paper.

Internet and the proposed system

Practical implementation of the system will result in new Internet technology. This will include support of repository of objects and their descriptions as well as mechanisms for handling them.

This service can be sufficiently multipurpose. At various levels of decomposition the same object can be seen in different ways, from various viewpoints, such as those of a system analyst, a system end user and an information system developer. Each type of repository user will have his own access level in terms of the set of accessible functions and admissible use and change of the repository's objects.

The object model of the man's environment will be accessed, e.g., through the Web interface. Handling the model will include:

1. Selection of the object and context for its consideration.

2. Selection of a subset of the object's relevant characteristics

3. Interpretation of data

4. Restoration, if necessary, of the original form of the data for subsequent analysis.

This may entail advent of a new Internet service called "structured Web" having its own interface, adjustment tools, access/programming system.

Unlike usual Web service, this service will be designed from the very start for handling structured objects rather than arbitrary texts.

Conclusion

The author hopes that (probably polemical) conceptual principles set forth in the paper will help overhaul possible trends in computer technology development and provide a basis for fruitful discussion on the issues dealt with.

The proposed methodology may require considerable resources of memory and productivity. However, the unprecedented growth of storage capacities, speed of generally available processors and computer-to-computer communication possibilities is bound to produce qualitative changes. Introduction of the proposed methodology may be one of such consequences.

In my opinion, the possibility of introducing computer data history at the most general methodological level will have major consequences in various areas of human activity.

By way of example relating to the most general social and philosophical level, the possibility of setting up the "truth ministry" based on computer technologies readily rewriting the history in accordance with the wishes of the current top leadership, as described by George Orwell in his novel "1984", can be eliminated in principle. Indeed, paper carriers are now increasingly replaced by Internet technologies. Each Web site is a miniature "truth ministry" where the data can be arbitrarily changed depending on the current situation. Such examples are known to the author.

It is quite possible that our posterity will get acquainted with our life not only through photographs, films and old postcards, but also through structured data stored in databases providing history support.

12.20.1999.