IBM BOOK format: Difference between revisions

From Try-AS/400
Jump to navigation Jump to search
(+fixme)
(→‎Internal structure: Fix typos, TERSE)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{FIXME|Rework, add information, add proper categories}}
{{FIXME|Rework, add information}}
IBM BookManager is IBM's legacy online documentation system, historically used mostly on mainframes, midrange (AS/400) and OS/2, although it also saw some use in other areas including AIX. IBM no longer uses it for documenting current versions of its products, but there is an immense quantity of surviving electronic documentation from the period when IBM actively used it (from the 1990s thru 2010s), particularly documentation CD-ROMs/DVD-ROMs and images thereof.
'''IBM BookManager Softcopy''' is IBM's legacy online documentation system, created in the 1980s. It was historically used mostly on mainframes, midrange (AS/400) and OS/2, although it also saw some use in other areas including AIX.


The standard extension is BOO. No public documentation is known to be available for it. The header of BOO files commonly contains an EBCDIC copyright notice (this is true even for books for ASCII-based platforms such as OS/2). The file contents is almost certainly also in EBCDIC, but is not readable beyond the header. Maybe it is compressed with unknown compression algorithms (a relative of IBM's well-known proprietary algorithms such as TERSE?). However, the data seems too repetitive to be compressed–possibly it is just some obfuscation mechanism then?
IBM no longer uses it for new documentations of its products, but there is an immense quantity of legacy electronic documentation from the period when IBM actively used it (from the 1980s thru 2010s), particularly documentation CD-ROMs/DVD-ROMs and images thereof.


https://github.com/kev009/boo2pdf is code to convert BOO files to PDF. However, it relies on using the IBM SoftCopy Reader (for 32-bit Linux)'s JAR files and native code libraries (.so files) to actually read the file. The author of that code previously hosted a free public web service for that conversion, but discontinued it.
The file-internal format apparently is called IBMIDDOC.


Versions of IBM's SoftCopy Reader (previously known as IBM Library Reader and before that IBM BookManager READ) also existed for Windows, AIX (any other commercial Unix systems?), OS/2, DOS (meaning PC-DOS/MS-DOS, not mainframe DOS), MVS (later known as OS/390 then z/OS), VM/CMS (later known as z/VM), and OS/400 (later IBM i5/OS and then IBM i)–but apparently *not* mainframe DOS (DOS/360, DOS/VS, DOS/VSE, z/VSE).  
The standard DOS style extension is <code>BOO</code>, and "application/book" is sometimes used as MIME type, but was never officially registered.


"application/book" is sometimes used as a MIME type but was never officially registered
== Internal structure ==
 
The header of BOO files commonly contains an EBCDIC copyright notice (this is true even for books for ASCII-based platforms such as OS/2). The file contents is almost certainly also in EBCDIC, but is not readable beyond the header. Most likely this due to compression with an unknown algorithm (a relative of IBM's well-known proprietary algorithm such as TERSE?). However, the data seems too repetitive to be compressed–possibly it is just some obfuscation mechanism then?
== Notes on file format ==


* Bytes 1-2: unknown 2 bytes probably flags. second byte never zero, first byte often is (but not always). probably not a size, since 0001 is a common value.  
* Bytes 1-2: unknown 2 bytes probably flags. second byte never zero, first byte often is (but not always). probably not a size, since 0001 is a common value.  
Line 19: Line 18:
The header is 256 bytes long. The copyright string is padded to the end of the header with 0x40 (EBCDIC space). However, it seems to have a null terminator (0x00) at the end of the actual text, before the space padding.
The header is 256 bytes long. The copyright string is padded to the end of the header with 0x40 (EBCDIC space). However, it seems to have a null terminator (0x00) at the end of the actual text, before the space padding.


What follows from 256 bytes onwards looks like it could be some kind of compression dictionary???? Not clear.
What follows from 256 bytes onwards looks like it could be some kind of compression dictionary, maybe a modified TERSE algorithm?
 
<s>It looks like the file is composed of 256 byte records (so file size should be an integer multiple of 256), often with some all-zeros records appended to the end.</s>
 
There is weak evidence that the file format might be record oriented with 4096 byte records: IBM upload instructions into MVS datasets hint towards that. All known files so far have a file size which can be divided by 4096 without remainder. If the file size doesn't match, it's padded with binary zeros.
 
== See also ==
* [[Reviving InfoSeeker]]
 
== Weblinks ==
* [http://www.edm2.com/index.php/BookManager BookManager Releases] contains history of IBM BookManager releases
* [https://www.ibm.com/support/pages/using-pdfs-and-bookmanager-books-your-workstation-or-mainframe Using PDFs and BookManager Books on your workstation or mainframe], IBM.com
* [https://en.wikipedia.org/wiki/Terse_(file_format) Terse (file format)], Wikipedia
* [https://en.wikipedia.org/wiki/LZ77_and_LZ78 LZ77 and LZ78] compression algorithms, Wikipedia
* [https://patents.google.com/patent/US4814746A/en?oq=4814746 Data compression method] for TERSE, patents.google.com
* [http://fileformats.archiveteam.org/wiki/ZIP ZIP archive file format], archiveteam.org, mentions TERSE.
* [https://hercules-390.yahoogroups.narkive.com/gYwJ3QUu/terse-for-pcs-windows-aix-linux terse for PCs (Windows, AIX, Linux ....)], hercules-390@yahoogroups.com
* [https://bit.listserv.ibm-main.narkive.com/82nDDwrh/trsmain-question TRSMAIN question], IBM-Main mailing list
* [https://share.confex.com/share/121/webprogram/Handout/Session14242/14242-%20zOS%20Documentation%20Search%20Strategies%20%28Final%20Edition%29.pdf 14242: z/OS Documentation Search Strategies], presentation slides in PDF format


It looks like the file is composed of 256 byte records (so file size should be an integer multiple of 256), often with some all-zeros records appended to the end.
== Footnotes ==
<references />


== External links ==
[[Category: System Internals]]
* http://www.edm2.com/index.php/BookManager – contains history of IBM BookManager releases

Latest revision as of 23:58, 28 August 2023

Qsicon Fixme.png This article isn't finished yet or needs to be revised. Please keep in mind that thus it may be incomplete.

Reason: Rework, add information

IBM BookManager Softcopy is IBM's legacy online documentation system, created in the 1980s. It was historically used mostly on mainframes, midrange (AS/400) and OS/2, although it also saw some use in other areas including AIX.

IBM no longer uses it for new documentations of its products, but there is an immense quantity of legacy electronic documentation from the period when IBM actively used it (from the 1980s thru 2010s), particularly documentation CD-ROMs/DVD-ROMs and images thereof.

The file-internal format apparently is called IBMIDDOC.

The standard DOS style extension is BOO, and "application/book" is sometimes used as MIME type, but was never officially registered.

Internal structure

The header of BOO files commonly contains an EBCDIC copyright notice (this is true even for books for ASCII-based platforms such as OS/2). The file contents is almost certainly also in EBCDIC, but is not readable beyond the header. Most likely this due to compression with an unknown algorithm (a relative of IBM's well-known proprietary algorithm such as TERSE?). However, the data seems too repetitive to be compressed–possibly it is just some obfuscation mechanism then?

  • Bytes 1-2: unknown 2 bytes probably flags. second byte never zero, first byte often is (but not always). probably not a size, since 0001 is a common value.
  • Bytes 3-6: these 4 bytes appear to always be zero
  • Bytes 7: unknown, can be zero, probably more flags???

Bytes 8+: EBCDIC copyright string. Can start with one or more EBCDIC spaces (0x40), sometimes also (0xB4) (some kind of control character?)

The header is 256 bytes long. The copyright string is padded to the end of the header with 0x40 (EBCDIC space). However, it seems to have a null terminator (0x00) at the end of the actual text, before the space padding.

What follows from 256 bytes onwards looks like it could be some kind of compression dictionary, maybe a modified TERSE algorithm?

It looks like the file is composed of 256 byte records (so file size should be an integer multiple of 256), often with some all-zeros records appended to the end.

There is weak evidence that the file format might be record oriented with 4096 byte records: IBM upload instructions into MVS datasets hint towards that. All known files so far have a file size which can be divided by 4096 without remainder. If the file size doesn't match, it's padded with binary zeros.

See also

Weblinks

Footnotes