Filesystems and Files on the AS/400

From Try-AS/400
Revision as of 09:15, 11 May 2023 by PoC (talk | contribs) (→‎AS/400 IFS: More precise)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Qsicon Fixme.png This article isn't finished yet or needs to be revised. Please keep in mind that thus it may be incomplete.

Reason:

  • Add Chapter how to create physical and logical files,
  • Move explanations to access paths to a programming article?

Since the AS/400 has been designed as a business helper machine, it's processing capabilities mainly cover structured data with comparatively small record sizes. With the ongoing evolution of OS/400, more functionality was added by IBM that weakens this concise definition, though. On the other hand, the same added functionality improves ease of data storage for, and data exchange with other platforms.

Different approaches of data storage and processing

Structured data can be processed most efficient when the underlying logical and physical facilities support this approach natively instead of mimicking.

By comparison, most platforms outside the world of mainframes and minicomputers, as they were once known, operate on byte streams. Byte streams don't have any natural structure visible from outside. That is, without analyzing the content of the stream at any given moment, there is no chance to map the current byte position into any logical structure.
This is somewhat similar to the differences between data transmission in an asynchronous manner (such as the RS-232 protocol specifies) versus synchronous. With asynchronous transfers, data payload has to be enriched with additional information to enable processing logic to chop the continuous stream of data into pieces to enable processing facilities to work on them in a structured way.

Most, if not all data is based on some structure, though. The simplest structure is a blob of data of a given size, usually named a block. When looking closely, conversion from, and to byte streams from and to blocks takes place on many levels:

  • Storage of data on hard disks or in solid state memory in blocks of 512 Bytes or 4 KiB,
  • Paging and Swapping activity of virtual memory systems also takes place in blocks instead of streams,
  • Data Transfer over todays networks is done in pieces called frames.[1]

On a higher level of a file's content definition, more elaborated structuring of stream-based data can be found:

  • Pixel-oriented picture files include a header that dynamically defines the format of the data that follows,
  • Movie files are defined by different kinds of data blocks: To save space, many frames are only saved as differences to the picture before,[2]
  • Todays most used text processing software internally uses XML to give seemingly unstructured data a structure,
  • Application Program Files also are structured in a way that permits the operating system to efficiently load and execute the contained code,
  • Even plain text files have a structure: They contain of words, sentences, and paragraphs.

The mapping of structured data into byte streams leverages processing speed with flexibility to store arbitrary complex data structures into very simple data structures on disk. Today, the processing overhead of mapping byte streams into records (blocks) or more elaborated data structures is almost nullified by cheap availability of sheer processing power.

Storage Overhead

However, mapping byte streams into logical blocks of information often creates storage overhead. For example, a plain text file with the word 'Hello' uses only 5 Bytes of actual data[3] (sometimes followed by implicit end of line marker(s)), the file occupies at least 512 Bytes data, since this is the minimum block size on usual hard disks. If the file is expanded to exactly 512 Bytes by adding more text, the overhead is zero. Adding just one character starts the continuation of data in the next disk block and again occupies the whole block of that 511 Bytes are wasted.
This overhead might be negligible when looking at todays file sizes. But, every OS utilizes configuration files, that usually are small and fit the above description perfectly. /etc/hosts may be 680 Bytes long on a given system, that makes up 344 Bytes wasted space, according to the above calculation.[4]

Today, this storage overhead is also almost nullified by cheap availability of gigantic hard disk storage capacity.

Latency

Considering the above description, it's easily observable that actual data is scattered across a greater area of a given hard disk with many holes in between, compared to all files concatenated to one big blob. This increases latency, because the disk's head assembly must move more often to access a certain block on the physical media.

Today, I/O latencies are nullified by cheap availability of solid state memory “disks”.

AS/400 IFS

Early incarnations of the AS/400 had no facilities to save or process byte-based stream files. There only was the possibility to create files with a fixed record structure. With the introduction of the integrated file system in OS/400 Version 3, saving and processing of byte stream files has been made possible. The classical record oriented file system was in turn named QSYS.LIB-file system and looks like a classical Unix mount point, when looking with applications being aware of the IFS, such as qsh, the minimal UNIX like shell. The rest of the filesystems allow to store stream files.
Up to today, QSYS.LIB is still the most important file system on AS/400's because most part of the OS is stored there, as well as configuration database files and, of course application specific database files.

This IFS is not the same as file systems in Linux or other common operating systems. There, one or more disks are partitioned: Space from one disk split into slices of usable space by software. To make use of the space available in this partitions, a file system has to be created. That is, database-like structures for the mapping of files to disk blocks are written into any given partition space. The space available for the creation of files on any given file system is thus fixed to the size of the partition, minus overhead, minus space needed for the aforementioned data structures. Multiple filesystems have been conceived over decades with a determined set of features, so it's possible to use different file systems to match the different storage needs of different types of files.

On the other side, the IFS of the AS/400 is a purely logical structure that allocates disk blocks as requested by the operating system. If a record oriented file in the classical QSYS.LIB-filesystem is filled with data, disk blocks are filled with this data and are thus no longer available for storage. If someone transfers a picture file to the AS/400's /QOpenSys-Filesystem, disk-blocks are allocated for that data and also cannot be allocated for storage. There is no static assignment for these different file systems to partitions or disk blocks.

Stuff saved in file systems on the AS/400 is usually referred to as objects.

AS/400 files

The AS/400 file handling shares a common heritage of file handling found in IBM mainframes (MVS or z/OS partitioned data sets). An AS/400 file defines at least the record length. All members of these file share this maximum length and thus are constrained to this length.
In addition, an externally defined file can be more strict than an MVS PDS, by not only enforcing a record length, but also an actual field layout and -content over this record.
[5] This can be roughly compared to a directory on Linux or (better) a tar file, where one may only add TIFF image data with a common horizontal pixel count.[6]

For example, by creating a source file on an AS/400, one defines a file with a certain record length.[7] One may use this file, create members and add program source code to these members at will, but it's not possible to add data with any line longer than 92 characters into any member of that file.

File Types

Basically there are four types of files found in any AS/400's classical QSYS.LIB filesystem:

  • Source Files are physical files, aka database files with at least two fields per record:
    • Line of Text, primarily to hold program code to be compiled,
    • Timestamp, records the last change of that particular line.[8]
  • Physical Files are again database files but with a user defined record length and field format.
  • Logical Files are files that reference one or more physical files, but do not contain any data. LFs can be compared to the SQL terms:
    • view: Show only a subset of fields of PF,
    • join: Show a superset of fields of more than one PF,
    • index: Apply different indexes, and thus record sorting.
In addition, restrictions about which records to select or omit from the PF can be added.
  • Other Files that appear as type *FILE in output listings that include, but are not limited to
    • Display Files that map data to positions on the screen and enable user interaction to this forms,
    • Printer Files that map data to positions on a virtual sheet of paper for later printer output,
    • Inter Communication Facility Files who provide an easy way for application programs to talk to other programs on the same or remote AS/400's,
    • Save files that provide roughly the same functionality as tar files on Linux, enabling bundling of files (objects) of different types into one self-contained archive, for easier transmission to other systems for re-extracting or permanent storage.

Handling of these files will be explained in separate articles.

A word on physical and logical (database) files

The main usage of logical files is to provide different access paths compared to the linked physical file to read data. An access path is just one or more database index(es) created for one or more fields of a given physical file.

When accessing database files through the classical READ(P)(E)-API-Calls, there's no possibility to sort a result set dynamically as it's easily done with SQL. There's not even a result set. There's just kind of a file pointer that points to the last read row, and the order in that the rows are retrieved by subsequent READs (that increment this file pointer) are determined by the access path(s).
Keep in mind that this handling of structured data stems from a time period when computers had only a tiny fraction of today's processing power[9] and storage capacity.

Considering the clear positioning of the AS/400 as database machine by IBM, and adding the fact that the precursor to SQL was created in the early 1970s by IBM employees, SQL might have been available on the AS/400 from it's first release.
On older machines, SQL proves to be painfully slow, especially when dealing with huge database tables. Since the SQL engine dynamically adds database indexes as needed when processing queries, getting data out of a table can be really slow when such a temporary index must be created in advance.

On the other side, utilizing the legacy API-Calls to retrieve and process random records with CHAIN-READ from multi-GiB-sized database tables is surprisingly quick on old machines for many reasons.

This somewhat restricted API call interface explains why data handling within applications on the AS/400 feel so fundamentally different compared to what we're used to.

Most common platforms utilize SQL:

  • The user is providing (part of) a query string, the database picks matching rows and
  • the application delivers a limited view of the whole database content back to the user.
    • The user may easily change sorting of the result records to be based on any single field by usually clicking on the respective table header field in the application's UI,
  • the user may finally select the desired record for further processing.

AS/400 applications usually…

  • provide a complete list view[10] of the entire database content, with a fixed sort order. The user may either scroll to the desired record, or type part of a search string in a designated input line and press Enter.
  • The application reloads the list view after CHAINing (placing the file pointer) to the first record matching (part of) the search string[11]. The rest of the list is filled with subsequent READs until the list is full or EOF is reached.
    • Skillful programming techniques allow the user to freely scroll forward and backward from the record retrieved in the last step.
  • The user may finally select the desired record for further processing.

Programs don't need to maintain the list programmatically xor allocate memory for loading the whole database. Details will be explained in the articles about programming.

Footnotes

  1. To be precise, this isn't done to ease or enable record oriented processing but to enable sharing of physical media (like cables) between many participants of the network.
  2. From time to time, a key frame contains a complete picture to compensate for data loss, that can happen for many reasons.
  3. Counted as ASCII, or SBCS as named in the AS/400-World.
  4. Actually, with todays partition sizes, Linux uses 4 KiB blocks, not 512 Bytes. This raises the overhead of said /etc/hosts to 3416 Bytes.
  5. The term external is greatly misleading here, because the file definition is external to the program using that file. Indeed, the field layout is internal to the file object.
  6. This is fairly unpractical in computing as we know it.
  7. 92 Bytes by default.
  8. SEU shows these timestamps on the very far right of it's display.
  9. Processing as in Read from disk — Modify — Write to disk.
  10. These lists are called subfiles.
  11. To be precise, the search string is matched against the start of the respective strings in the key field. It's not a … LIKE '%FOOBAR%'-matching that is simply not possible with the legacy API.