Character encoding from FileFactory

mwilliam · Post by **mwilliam** » Wed May 04, 2005 3:40 pm

Hello,
I'm curious to know is it possible to run with default page of “file.encoding=ISO8859-1” and have a BufferedReader to read a file stored in EBCDIC? With the reader returned by the method: FileFactory.newBufferedReader(“//DD:SYSPARM”...), would it be possible have the reader return ISO chars that were converted from EBCDIC?

Post by **coz** » Wed May 04, 2005 7:18 pm

Sure -

Just use the FileFactory method:
public static BufferedReader newBufferedReader(String filename, String encoding)

With an EBCDIC encoding and you can read from your DD irrespective of what file.encoding you are using.

Post by **dovetail** » Thu May 05, 2005 9:30 pm

Actually, the default encoding used by the FileFactory when creating readers and writers on MVS datasets is ZFile.DEFAULT_EBCDIC_CODE_PAGE, which defaults to Cp1047. This made some sense to us at the time - its reasonable to default to EBCDIC when reading datasets. The default for FileFactory when reading HFS files on z/OS (or files on other platforms) is to use the default JVM file.encoding.

So, you should be able to use the default encoding with FileFactory and read/write EBCDIC text files regardless of your default JVM file.encoding.
As Steve points out, there are also overloaded methods which allow you to specify a particular encoding.

Actually, I think that FileFactory should probably use ZUtil.defaultOutputEncoding, which is configured at startup to match the Locale's default encoding... this would be Cp1047 for many users, but might be different for say German or French shops that use a language-specific EBCDIC codepage as their default. I'll look into correcting this in the next release.

Feel free to browse the source code for FileFactory, ZFile, ZUtil, etc. We welcome any suggestions for improvement.

mwilliam · Post by **mwilliam** » Mon May 09, 2005 9:12 pm

Well,
my actual intent was to process XML documents stored in non-HFS /and or files in native ebcdic characters. I was attempting to override the URL file resolution phase of the "org.apache.xerces.parsers.SAXParser" parser by supplying the BufferedReader object returned the FileFactory class.
To make long story short, it could have work, but the XML Parser still attempted resolve the URL (file name) to an actual file.

As noted by JRIO and Zfile classes denotes an MVS dataset (or ddname) by the ‘//’ notation. This however denotes the name of server in the URL resolution phase.

However, as a positive note, almost two years ago, after attempted to locate a Java XML parser to process non-ASCII documents, I would up down loading and modifying the source to suit my XML requirements. So, I have a modified version the Piccolo XML Parser, which utilizes JRIO classes to process XML documents stored in ebcdic (provided that the default code page is Cp1047). Well, this seems suitable for my initial needs.

The problem is that this parser lacks the advance features of the standard apache XML Parser.
To Prevent reinventing the wheel again, has anyone found suitable XML which functions properly regardless of the default code page, and can read MVS non-HFS datasets?

Post by **dovetail** » Thu May 12, 2005 8:59 am

The Xerces parser will do this; the trick is to use a org.xml.sax.InputSource.

There are a couple of ways to handle this:

1) If you construct an InputSource on an InputStream (binary), then the XML parser will "detect" the codepage by getting the encoding from the XML prologue. (BTW, if not specified in the prologue, the default encoding is UTF-8).

for example:
ZFile file = new ZFile("//DD:XML", "rt);
InputSource is = new InputSource(file.getInputStream());
saxParser.parse(is, handler);

2) If you construct an InputSource on a Reader, then the parser will use the reader's encoding, regardless of the XML prologue.

for example:
Reader rdr = FileFactory.getReader("//DD:XML"));
InputSource is = new nputSource(rdr);
saxParser.parse(is, handler);

This will parse using the JZOS default codepage (EBCDIC), since thats the default reader encoding for FileFactory (if not explicitly specified). This method is handy when the XML doesn't contain an encoding in the prologue, or the prologue indicates an ascii encoding even if the actual data is EBCIDIC. This might happen if an XML file is uploaded in text mode via FTP.

mwilliam · Post by **mwilliam** » Fri May 13, 2005 2:50 pm

Thanks a lot for the info.

Odd though, because I have been using Xerces parser from the beginning. I didn’t realize there were so many methods.

From your previous example, I had a devil of time trying to figure out the proper parse classes & methods to use. I’ve discovered following definitions worked best for me.

public class XMLParseConf extends org.xml.sax.helpers.DefaultHandler
{
…

private SAXParserFactoryImpl parserFactory = new SAXParserFactoryImpl();
private javax.xml.parsers.SAXParser saxParser;

…
saxParser = parserFactory.newSAXParser();

Reader rdr = FileFactory.getReader("//DD:XML"));
InputSource is = new InputSource (rdr);
saxParser.parse(is, this);