Cond Code 0 after failure

Discussion of Co:Z sftp, a port of OpenSSH sftp for z/OS
Post Reply
PaulB42
Posts: 27
Joined: Fri Aug 21, 2009 5:32 am

Cond Code 0 after failure

Post by PaulB42 »

We experienced a couple of errors similar to the following over the new year period:

CoZBatchÝN¨: Copyright (C) 2005-2009 Dovetailed Technologies LLC. All rights reserved.
CoZBatchÝN¨: version 2.0.1 2012-01-14
CoZBatchÝI¨: executing progname=login-shell="-/bin/sh"
/usr/lpp/coz/bin/cozsftp 16: FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable.
CoZBatchÝI¨: returning rc=exitcode=0

I haven't investigated the 0b250012 yet, but I'm more concerned by the fact that we get a condition code 0. As a result these failures were not spotted, and teh customer lost some files as a result.

Is there some method of trapping this error and returning a non-zero cc?

Thanks
dovetail
Site Admin
Posts: 2025
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

Please post the input script / job / JCL that you are using.
PaulB42
Posts: 27
Joined: Fri Aug 21, 2009 5:32 am

Re: Cond Code 0 after failure

Post by PaulB42 »

I think the 0b250012 is because the values of MAXPROCSYS or MAXPROCUSER were too small; I have now increased them.

Here is the JCL and script:
//DOWNLOAD EXEC PGM=COZBATCH
//STDIN DD *
# Co:Z batch input follows ...
coz_bin="/usr/lpp/coz/bin"
remoteuser="xxxxxxx_axa_xxxxxx"
server="XXX_XXXXXXX"
clientcp="IBM285"
servercp="ISO8859-1"

export DISPLAY=none
export _BPX_SHAREAS=YES

ssh_opts="-oBatchMode=no"
ssh_opts="£ssh_opts -oConnectTimeout=60"
ssh_opts="£ssh_opts -oServerAliveInterval=60"
ssh_opts="£ssh_opts -oStrictHostKeyChecking=no"
# Invoke the Co:Z sftp client

£coz_bin/cozsftp £ssh_opts -b- £remoteuser@£server <<EOB
lzopts mode=text,servercp=£servercp,clientcp=£clientcp,linerule=crlf
lzopts -a
cd /Actuarial
ls
put //'XXXX.XXXX.EXT' XXXXXX.C12
EOB
/*
dovetail
Site Admin
Posts: 2025
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

I agree that this is a problem, but I'm pretty sure that it is an issue with the z/OS Unix shell.

Co:Z Batch executes the z/OS Unix Shell (/bin/sh) and pipes the input from DD:STDIN into it.
The shell is getting an error (FSUM7726), but it is not setting the exitCode, which I believe that it should.

We will try to reproduce this and if my suspicion is confirmed we will report the bug to IBM. We have reported similar problems in the past (e.g. OA40087) and IBM has fixed.
dovetail
Site Admin
Posts: 2025
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

Sorry, I forgot to ask - what version of z/OS are you running?
PaulB42
Posts: 27
Joined: Fri Aug 21, 2009 5:32 am

Re: Cond Code 0 after failure

Post by PaulB42 »

Many thanks.
The affected system is z/OS 1.11
dovetail
Site Admin
Posts: 2025
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

We were able to reproduce the problem -

The z/OS Unix shell can fail with this error and not set a non-zero exit code.
In this particular case, the "cozsftp" command - which is a shell script, is where the error occurred. When that command failed there should have been an exit code set, which would have been adopted as the exit code for the top-level login shell and then by Co:Z Batch.

We have reported the problem to IBM.
PaulB42
Posts: 27
Joined: Fri Aug 21, 2009 5:32 am

Re: Cond Code 0 after failure

Post by PaulB42 »

Hi

Do you have an IBM PMR number for this problem?
Our problem management want some kind of reference number so I can prove that I am tracking this problem :shock:

Thanks
Paul
dovetail
Site Admin
Posts: 2025
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

The PMR is 94417,756,000

Below are the steps to reproduce the problem. Please verify that you can reproduce, and if so I would suggest that you open a problem with IBM also.

1) Set the "PROCUSERMAX" in the OMVS segment for a z/OS userid to 3.
*** The userid must be a regular userid - not UID=0 ****
2) Login to the userid under a z/OS shell (I used an ssh shell, so
that I have consumed 2 processes:

/u/vendor/kirk>ps -ef
UID PID PPID C STIME TTY TIME CMD
KIRK 83951640 67174420 - 11:53:57 ? 0:00
/us/sbin/sshd -f /etc/ssh/sshd_config
KIRK 16842785 83951640 - 11:53:57 ttyp0000 0:00 -sh
KIRK 83951650 16842785 - 11:56:45 ttyp0000 0:00 ps -ef

3) create an executable shell script called "t.sh" -

#!/bin/sh
x=$(command -v /a/b/c)
rc=$?
echo x=$x rc=$rc
exit $rc

4) execute the test shell script:

> ./t.sh
./t.sh 2: FSUM7726 cannot fork: reason code = 0b250012: EDC5112I
Resource temporarily unavailable.
/u/vendor/kirk>echo $?
0

As you can see - the script fails and stops executing on line 2, which
is to be expected. However, the "last exit code" is not set to a
non-zero value. (It should probably be 126).
PaulB42
Posts: 27
Joined: Fri Aug 21, 2009 5:32 am

Re: Cond Code 0 after failure

Post by PaulB42 »

Hello Kirk

many thanks for this. I have reproduced the problem and raised a PMR (92007 001 866)

Paul
PaulB42
Posts: 27
Joined: Fri Aug 21, 2009 5:32 am

Re: Cond Code 0 after failure

Post by PaulB42 »

I have had a response from IBM as follows. Essentially, they are unlikely to fix this.
Action taken: we had a very long internal discussion about this issue.
Here is a recap of the developer's investigation.

Processing of "sh ls" or "/bin/ls (i.e.) in a shell script" is very
different from processing of "( )". ( x=$(command -v /a/b/c) referring
to test script reported before into the pmr ).
One invokes the "spawn and exec" of the
specified command for processing, and the other invokes fork and setup
the environment for compound construct to run multiple commands and
signal handling.

In the case of "sh ls", "spawn and exec" processing of the command, we
technically "attempted to invoke" the command and failed, therefore
returned the failed RC. In the case of "( )" processing, the S&U failed
during the setup of the forked environment, thus "never attempted to
invoke" the command specified in the "( )". Therefore, it returned the
last command's RC.

From our investigation and understanding of code and documentation, it is working as designed. We do not want to change the behavior of "( )"
compound construct processing to return RC of the environment setup
failures in the service stream since many customer's may have coded for this behavior (incorrectly or not). The FSUM7726 error message clearly
identifies the error at this point. We would certainly take a MR and
look into changing in the development release.

So, the conclusion is that the shell code is working as designed.
A marketing requirement may be opened in order to request a change
of shell behaviour.
dovetail
Site Admin
Posts: 2025
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

Thanks for this info. We will look into this a little more to see if there is anything we we can do to work around this issue in the z/OS shell.
In the mean time, please set MAXPROCSYS or MAXPROCUSER high enough to avoid this issues.
dovetail
Site Admin
Posts: 2025
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

We will be updating our sample shell scripts in the next release so that we pre-set an exit code so that if the shell fails with max-procs that it will be set.
In the mean time, a work around is to avoid this by setting MAXPROCSYS or MAXPROCUSER appropriately for your environment.

It is unfortunate that IBM doesn't agree that they have a bug - we will also pursue this with them.

Thanks for reporting this.
Post Reply