Inside Technique : Legacy Data and the Web: Getting it to the Web!
Legacy Data and the Web Series
We promised to tell you how to get specific data streams to the web. As with all adventures in technology, there will be several paths you can take. Look over the information we give you and then have a serious chat with all of the people in your organization to see what path is most appropriate.
We are going to concentrate this time on the big corporate datastreams from IBM and Xerox.
In the IBM world you tend you have line data, line data conditioned with Advanced Function Printing (AFP) page Definitions (Pagedefs) and Form Definitions (Formdefs), plus the true composed datstream AFP and its variations produced by ACIF, a packaging tool for archiving AFP datastreams. Let's not forget that many vendors have products that produce AFP print steams, and quite a few vendors manufacture AFP printers. This means that there may be variations in the AFP datastream depending on what your normally targeted printer is.
We talked about moving line data to the web in the last edition of this column, so we'll refer you to the Site Experts archive for more information there. If you have line data files that are conditioned with pagedefs and formdefs, which is the vast majority of all AFP printing today, you have to ensure that the fonts, graphics, called overlays, and conditions programmed into the page formats and copygroups used in the pagedefs and formdefs are handled appropriately for your output to the web. The fonts are a fairly straightforward issues, by now. If you've read the earlier columns you know to look for what fonts are used in your applications and what options you have to mirror them on the web. For most line data in an IBM print environment the fonts have no real equivalent in the web world. This is where the hard decisions come into play. If you must have and exact duplicate you may be forced to burn your print file to an image format and display the image. Messy and band-width consuming at best. Hard to read for most of us as well. If you have more flexibility, try selecting something like Courier or the MS Terminal font and see how that works for your application.
All of the processing that handles overlays and programming in pagedefs and formdefs must be handled by some process to move the print file to the web. If you have skilled exec writers you may be able to handle it in house by writing procs and execs to re-process the file instead of feeding it through the process that creates the AFP print file. This takes an extremely in depth knowledge of both AFP and the underlying architecture. Both pagedefs and formsdefs can alter how the data is placed in the output page, as well as controlling the overlay calls and the pagination.
It may not seem like a lot of work, but there is a tremendous amount of programming behind these tasks. Consider things like conditional processing which reads triggers in the datastream, and pagination that is intelligent about simplex and duplex printing. And don't forget orientation issues. If you have a pagedef/formdef combination that allows for tumble/dubplex printing (every other page is upside down), what happens when you put that on the web?
If your target application was written in house the more common way of getting to the web is to acquire one of the many transform products that can handle the datastream accurately and allow you to tune the output for the web without touching the original program. If you are using a vendor program to produce your print you should start by asking if they have a filter or plug in that provides the transform inside of their program environment.
The same rules apply to the composed form of AFP. Check with your vendors first, and if they cannot help check with the transform vendors to see what they have to offer. Remind them of the font issues you face and the needs you have with regard to your graphics. There are big resolution differences for the graphics to contend with, so you need to be prepared for some compromises. This is now a mature industry and there are many options available. Those lists you made will help you discuss your needs and current print file specifications with your vendors so that they can give you ideas of what they have to offer.
If you you are a Xerox customer your printers may print line data, line data conditioned with DJDEs, or metacode. As in the IBM world, line data is still the most common print format, and line data conditioned with DJDEs (Dynamic Job Descriptor Entries) is found around the world. There is a tremendous variation in how DJDEs are programmed and consequently, a huge difference in how print files may behave when put through transform processes. When line data and DJDEs are combined with metacode files, as they often are, the complexity involved in moving the data to the web is immense, but it can be done successfully.
First you have the font issues to contend with. As always. If you are using the ubiquitous P0612b or P06BOB fonts you have the challenge of finding fonts that will do the trick for you. This is going to be tough but you can do it. Font vendors like ASE sell windows friendsly versions of the full Xerox font set and this may do the trick for you. Remember that if you rely on them you will need to ensure that they will be available to the browser of everyone who needs to see the pages that use the font. There are several solutions, including some new technologies that allow you to put the font in the file. They are worth checking out.
And the graphics to contend with. Remember that the file formats are quite different and there is always the chance that the web version of a graphic may not be what you had hoped. And adding to the fun is the fact that Xerox resources generally reside on the printer, where the programs and transforms can't get access to them. This means mirroring the printer hard drives on a hard drive that the programs you write or buy can see. Remember this as you begin configuring your environment.
Then you have to decide how to best get to the web with your Xerox files. As in the IBM world you may choose to write your own transform. In the complex world of the Xerox print stream you will need an expert to get you through the process, line data can move to the web with an intervention from a proc or exec with fair success. Metacode is another story, though. It is a proprietary datastream owned by Xerox and therefore documentation on its nuances is hard to come by.
We tend to recommend a close look at the vendors who operate in this niche in the market since they've fought long and hard to develop solutions that will work in a production environment. Companies like The Xenos Group, Elixir, Emtex, and SysPrint have long histories in dealing with the guts of the high speed printer datastreams. IBM and Xerox, as well as Oce and i-data have groups in their organization who can help with the issues surrounding getting to the web as well.
Remember that if you go the transform route, the transform has to know everything the printer knows. This is a big job. That's what all of the lists are for, and this is why you need systems folks to tell you how the current printing environment is configured. That configuration includes the control software, which is the operating system on a Xerox printer, but something like PSF (Print Services Facility) software in an AFP environment. I used the phrase "something like" because there are a number of variations on PSF within the IBM world, plus competing products in the Oce world.
Since we've mentioned going to vendors, we should disclose that we used to be one. We began a company called GenText in 1989 and spent most the the decade of the '90s developing the datastream transform and filter products that came to be used in the majority of COLD/Archive products and now many of the large in-house implementations of web-serving legacy data. We sold the company in 1998 have gone the consulting/speaking/education route ever since.
Next time we'll take a look at how to solve problems with the data on the web
just doesn't look right. As always,
send your questions to
Copyright 2000 McGrew + McDaniel Group, Inc.
© 1997-2000 InsideDHTML.com, LLC. All rights reserved.