Database Transformation,
Synthesis and Playback Technology
by David M. Schwartz, CEO, ImaginOn, Inc.  24 October 1996

Synopsis
The current level of technology in database software is hierarchical relational databases and the search engines that access them. This technology works well for most of the local data storage, retrieval and presentation tasks undertaken by businesses and institutions. Nonetheless, even in these "normal" areas problems have always occurred at the boundaries between databases and among databases of different formats. For tasks outside the normal course of business management, traditional database structures, methodologies and tools rarely perform adequately. These areas of inadequate performance include large chaotic networks such as the Internet, real-time accessed databases such as film, video and audio clips, and extremely complex, re-entrant dynamic data structures such as large computer programs.

ImaginOn technology solves the access, retrieval and presentation problems associated with problematic data structures and systems such as those mentioned above by applying the principles of database transformation, network synthesis and automated adaptive playback. Database transforms are the processes by which a source database structure and its contents are converted into an output database that has a specified relationship to the source. Network synthesis is the process by which a multidimensional network is created according to a set of rules. Automated adaptive playback is the process by which the data within any network structure can be presented sequentially on a non-linear path jointly determined by the user's preferences and the rules of the process. To date, three distinct commercial applications of ImaginOn technology have been identified: interactive film authoring and playback, world wide web data processing, and automated testing of computer programs.

Background
The history of mainstream business computer database technology is the story of the transition from flat-file databases to relational databases. In the engineering community, hierarchical databases of both flat and relational types have been utilized in specialty applications like computer aided design and project management. Flat-file databases consist simply of data entries in the form of records containing a set of fields delineated from one another by commas (or other character). Records are either separated by an end-of-record character, or by the implied boundary of a fixed record length. Flat-file databases are accessed by reference to the record number itself or by searching for a user-defined string (set of characters) within the database, which will then return the ID numbers of all records that contain the string.

Relational databases represent an improvement over flat-file databases in that the database is addressable down to the field level. This provides powerful sorting capabilities to the user. A cross-section of a relational database can be retrieved based on attributes within a specific field or by combinatorial logical (Boolean) operations on all fields. For example, in the California DMV database, finding all the families with the surname "Jones" who own 1972 Ford sedans is a simple matter of sorting the database for "Jones and Ford and sedan and 1972".

Hierarchical relational databases have the additional feature of "nesting". Nesting refers to a multi-level aspect of the database. In computer aided design, the top level of a design is frequently a simple block diagram showing the main sections of the design, such as CPU, Memory, I/O, Peripherals, and Power Supply. The connections to and from each block are indicated. The next level in the hierarchy is the design of each section itself. Then, within each section there may be custom integrated circuits that are themselves the subject of another level down in the hierarchy. With a hierarchy, the user gains the ability to access data specific to the level of detail under investigation, without the additional detail contained in the next level down. The limits of a database search can then be defined in terms of depth as well as relationships within a level.

Independent of database type and structure, the problem of information access and retrieval has typically been approached with a single primary specification: find the data of interest and deliver it. For extremely large, complex and far-flung databases this approach is analogous to the command "visit every apple orchard in the world and bring back only the sweetest apples". The search engine dutifully visits every apple orchard, finds the closest thing to "sweetest" and harvests those. The result is one big pile of apples. Too big, usually. The user is then faced with sampling every apple in the pile. Even then, some useful information remains hidden. For example, if the very sweetest apples all came from one region, that could only be determined by another search or post-processing of the tags on all the apples. Thus, sometimes it is the connection among data items, as opposed to the items themselves that is the most valuable information.

Anyone who has attempted to use the Internet for research or entertainment has encountered the inadequacy of the current generation of software to some extent. The symptoms are slow access times, erroneous search results, recirculating linkages, pointers to defunct data sets or locations, dead-end links and screen after screen of useless web pages. One approach to improving the situation has been the application of "Intelligent Agents". The idea is that these software robots will assist the user by handling some of the grunt-work associated with web data access, retrieval and presentation. At most, these tools will automatically log onto the net, go to a list of web sites and then download the data from them to the user's hard disk drive. In every case, the robot has to be trained; told what pages to go to and what kind of data to download. They are not much help, but better than nothing.

When the database is digital videotape clips or other real-time data such as digital audio the nature of the problem is not so much the scope or size of the database, but the requirement that the data be retrieved in real time, at the frame rate or sample rate of the content. Slow-motion or interrupted playback of visual and audio material is generally not of much interest. For databases of video clips on local disk drives or databases spread across networks, seamless access from one data item to the next is difficult and in many cases impossible. The present generation of database technology is not designed for real-time data processing.

For a database composed of the executable (assembly language) code of a computer program, a set of input data, control vectors (user input), and output space (display memory or other target), the present generation of database technology is virtually useless. At most, the frequency of occurrence of specific words or strings can be discovered, or some parts of the structure can be deduced. Little knowledge about the correctness of the code can be gained.

Specialized analysis software does exist that can "snoop" while a program executes, creating a database containing information about the history of program execution and dataflow through the program. But, these tools are intrusive, provide limited functionality and consequently are not widely used.

ImaginOn System
ImaginOn's technology is contained in three functional areas; transformation, synthesis and playback. Although it is convenient to separate the software operations of the system into these parts, in actual use the distinction is not always clear. This fuzziness arises when some functions are called by more than one part of the system and when some operations are performed in parallel.

Transformation
Database transformation is based loosely on the digital signal processing (DSP) model. Whereas the data in DSP is usually derived by sampling a real-time signal of some sort, the input data in the ImaginOn system is some pre-existing database. Any database will do; a text file, a directories of filenames, images, video clips, a network of data servers, whatever. The input to the transform function is the source database (or source databases) and the output is a subset of the source database. Between input and output is a "black box" containing the transform engine and its filters. The transform procedure is similar in theory to the creation and operation of finite impulse response (FIR) filters on digitally sampled signals such as audio or video waveforms. Like digital signal processing with transform functions, database transformations can be tuned, that is to say filters can be designed or transform functions specified appropriate to the input data and optimized to create output suited to the user's particular needs. Filtering functions can be performed on the database in the temporal, spatial and frequency domains.

An example of a temporal filter is updating an output database by sampling a source database at specific time intervals. A spatial filter can delete content from a database to meet storage requirements. A frequency-based filter can create an output database by including items only if they are pointed to (addressed) more than a specified number of occurrences from other nodes in the source database. Of course, filtering can be based on more than one dimension at a time and to varying degrees or extent in each dimension. Virtually all of the filtering techniques developed for DSP can be applied in database transformations. To name a few: decimating, bandpass, high-pass, low-pass, comb, shelving, and notch.

In the web processing application of ImaginOn technology, "WebZinger", the transform function produces a condensed format suitable for high-speed access during playback. The DSP analogy to this is lossy data compression. Comparing the input versus the output databases, ImaginOn performs compression in four ways:

1. Reduction in scope of the database
The number of nodes made available for processing is less than the number of nodes in the source network.  The extent of the scope is determined by storage availability and time limitations.

2. Decimation of the number of nodes within the scope of the process
Many of the nodes processed are deleted, never appearing in the output database. The extent to which a selected node matches the node selection criteria is saved as a data array in a format like that of the selection criteria data array itself.  Nodes may also be selected by other filtering functions such as time or frequency of reference by other nodes.

3. Bandpass filtering of the data within each node
Only a fraction of the total data within a node is reformatted and saved. The parameters of the filter are set by the user or the default parameters are used. Filtering may be of any type, including by data class (audio, video, still image, text, etc.), by time (for example, only take the data pertaining to specific dates), by frequency of update (for example, only take data that is updated hourly), or spatially (for example, take data from twenty geographically diverse sites), etc.

4. Link information is reduced or deleted
    Since only qualified nodes are included in the output database, all of the links representing unused nodes along them path to a selected node can be replaced with a single vector which is a synthetic link that never existed in the source database. Likewise, all unused links pointing out of a selected node are deleted, except those at terminal nodes. At terminal nodes (those at the end of a series of linked nodes), the unused outward pointing links are saved for future use in a transformation that is started from that terminal node.

Synthesis
A synthetic network, in the context of this discussion, is one that does not exist either in nature or as a pre-existing manmade structure. The process of network synthesis is, considered by itself, nothing new. Multidimensional tree-like data structures, their creation and implementation, have been a part of computing since day one. It is the close coupling of ImaginOn network synthesis with ImaginOn database transformation that yields tree-like networks that are novel and useful. In synthesis, the output database created by the transformation process is structured as a network in "n" dimensions according to a set of rules using parameters that are set by the user, or the default settings. The purpose of the structure is threefold: the structure serves as "rails" that the playback engine can traverse, the non-linear sequential organization preserves much of the original connectivity information that was present in the source database before transformation, and a time axis is implied. It is the combination of the "rails" and the implied time axis that provides the user with the illusion of virtual motion in "dataspace" during playback of the transformed database.

Networks synthesized by ImaginOn are variable in size, shape and complexity. Each of these variables can be set by the user. Size refers to the total quantity of data that is contained in the synthetic network. Shape refers to the relationship between the depth (or height) of the tree and its width. Complexity refers to how many dimensions, or branches, radiate from each node in the tree. While there are no theoretical limits to any of these variables, there are obvious practical considerations that pertain to storage limitations, display limitations and computational time. Similarly, there is no requirement that structures be symmetrical, though for ease of conceptualization, symmetrical trees are preferred.

ImaginOn synthetic networks are stored in computer memory as a database of items commonly referred to as "objects". Each object of a given class in the structure, for example, trunk node, branch node and data page contains within it pointers to the actual physical location of the data it references as well as pointers to any other object it is linked to in the tree structure. When such a synthetic network is embodied as a schematic drawing, the objects are graphic library elements and the links are defined in a text file known as a "netlist". Since the graphical elements can also be represented as sets of x,y coordinates, the entire synthetic network can be stored as ASCII text in human-readable files, if so desired.

By definition, the synthetic network is hierarchical. The top level of the hierarchy is represented by the nodes along the trunk of the tree. The next level down contains the first layer of branches and so on down through layers of branches to the lowest level which contains the leaves. Since the actual physical address of the source data from which the transformed data was obtained is stored along with the data for each node in the tree, new synthetic networks can be created which are continuations of any previous network by performing a new transform on the source database beginning at that point.

Playback
Since the present generation of database technology was developed primarily for business and scientific uses, the underlying concept behind data presentation in database products is information transfer; from computer to user. Static screen displays of complex data and comprehensive printouts of search results are the norm. Rapid access to large quantities of search results is typically supported by graphical lists of contents, or indexes, from which the user can select the next page display. When the results database is extremely large, even indexed displays may require large numbers of keystrokes and many hours of viewing time to zero-in on a specific item of interest.

Conventional data presentation methods such as graphical browsers and relational database formatted-field displays require the user to indicate what the next display screen will be and when it will be displayed. Typically, the user accomplishes this by clicking on a "next" button or a line of displayed hypertext which is a "hot-link". There is no way to avoid this one-page-at-a-time procedure. ImaginOn offers a completely different model of data presentation, similar in some respects to the way consumer entertainment products work. The concept of "playback" is utilized in the same sense that VCRs play back videotape, or CD players play back music.

ImaginOn's playback technique is based on an adaptive algorithm that determines both what will be displayed next and when it will be displayed. This feature allows the user to be completely passive, or as active as they want to be during the data presentation process. When the user decides to be a completely passive viewer, ImaginOn playback is automated according to it's internal rules. This capability is fundamental to ImaginOn's computer software program testing application.

The analogy between a VCR's playback of videotape and ImaginOn's playback of an output database appropriately describes the seamless nonstop continuous nature of the presentation, but does not encompass the new features provided by the system. These unique capabilities include dynamic path selection through the database, adaptive automation, variable playback resolution, and hierarchical playback.

Dynamic path selection
During playback, the user traverses a path through Imaginon's synthetic network organization of the output database. The path may start anywhere in the network, though the default is the first Trunk Node. The actual vector the user follows through the network is dynamically determined, assuming the user provides some direction. The user's selection of the path vector is made with one key, or button, by responding to the alternating branch choices visually presented (or presented by audition, in the case of an audio database) at Trunk Nodes and Branch Nodes. The next data item is then immediately presented, without an interruption in the display or audio output. Should the user fail to make a selection, playback is continued along a default path, without interruption, after a predetermined number of alternating choice presentations.

Adaptive automation, autoplay and adaptive transforms
Every time the user makes a selection, thus determining the current path vector, the user's profile is updated to reflect the state of the match between the present node's selection criteria and the user's profile. In this way, the playback algorithm "learns" about the user's preferences. If, after guiding playback for a while, the user ceases to make selections, playback may be continued automatically utilizing those stored preferences. In effect, the playback algorithm acts on behalf of the user, as adapted from the history of the user's path through the synthetic network. Either the adapted user profile, or the initial user profile may be selected to play back a database path automatically, without user intervention. This mode of operation is called "autoplay".

The ImaginOn playback software maintains a history of all user preferences and selections since the time of program initialization. This user history may be used by the database transform process to set the selection criteria and filters for creation of a new output database to be made from any source database. The new output database can be grafted onto the existing synthetic network, or a new network can be created. If "learned" criteria are used to drive the database transformation, the transformation itself can then be considered adaptive with respect to the user's history of preferences.

Variable playback resolution
By setting the playback parameters for holding data pages on screen time and the rate at which the alternating branch choices "flip", the overall speed of path playback can be controlled by the user. When these hold and flip times are set very low, a fast-forward effect results. This provides the user with a kind of
skimming, low-resolution view of the paths.

Hierarchical playback
If the user does not select autoplay mode and never makes a branch choice, playback proceeds by default from Trunk Node to Trunk Node. Since Trunk Nodes represent the main "stems" of the n-dimensional tree, the user will only see data pages along those main, top-level paths. If the user makes only one branch choice per Trunk Node, only the first Branch Node off of each Trunk Node is displayed, thus providing a view of the data along only the first layer of branches in the tree. This manner of playback proceeds successively down through the layers of the tree. Thus, hierarchical playback in which only paths down to a user-selectable level of interest in the synthetic network  are enabled.