Cond Code 0 after failure

Discussion of Co:Z sftp, a port of OpenSSH sftp for z/OS
tsgapd
Posts: 4
Joined: Tue Apr 14, 2015 9:27 am

Cond Code 0 after failure

Post by tsgapd »

Running Co:Z SFTP version: 2.4.1 (5.0p1) 2013-06-24

Looks like we got caught with the "Cond Code 0 after failure" (Topic 1296) issue with SFTPs issuing "FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable." but still returning a RC00.

We have increased MAXPROCUSER by a factor of 5 to get round the issue.

From the last post on the subject it says that:
"We will be updating our sample shell scripts in the next release so that we pre-set an exit code so that if the shell fails with max-procs that it will be set."

Anyone aware of a new release that now contains these updates?

Thanks,

Andrew Davis
dovetail
Site Admin
Posts: 2022
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

Do you have a "-vvv" trace that shows this failure? If so, please email to info@dovetail.com and we will review.

I do not believe that this error - FSUM7726 (from IBM Ported Tools OpenSSH) is related to the update that we did for sample scripts.

Since the error (FSUM7726) is from IBM code, we suggest that you might want to open a PMR with them.
It is a known issue that fork() can fail with EAGAIN on z/OS, and in our code we retry - but this error is in IBM code.
dovetail
Site Admin
Posts: 2022
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

After review, we have determined that this *is* similar to: http://z.dovetail.com/forum/viewtopic.php?t=1296

In your case, the z/OS shell is executing your shell script and it fails when it tries to fork() cozsftp command. The shell should be exiting with a non-zero return code but this is a bug in the shell in our opinion. Please open a PMR with IBM and refer to 94417,756,000
In your case, the fork is not failing on a $(command), but just a regular command.

Also, since your shell script is failing to fork() cozsftp, this probably means that you are not setting _BPX_SHAREAS=YES in your /etc/profile. You want to do this, since otherwise cozsftp will run in a separate address space and you will not be able to use DDs.

You might be able to work around this defect (as we did in our samples/sftp-batch scripts) by inserting a "false" command on the line before executing cozsftp. This will set the exitCode=1, and if the shell exits without setting the exit code it will at least not be zero.
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

If I add export _BPX_SHAREAS=YES and false to our FTP script just before executing cozsftp I still get an error as follows: "/usr/local/coz/bin/cozsftp 14: FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable. oZBatchÝI¨: returning rc=exitcode=0" and the batch job finsishes rc=0. We have also managed to recreate this problem running cozsftp through BPXBATCH. Interestingly if I run a test with false and set -e I always get rc=1. I'm not sure why this is. Anyway, have raised a pmr with IBM, but if you have any further suggestions please let us know as we have been able to recreate this problem on our test system.
dovetail
Site Admin
Posts: 2022
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

the cozsftp command is actually a shell script that does some configuration and then calls cozsftp_bin.

The line that is failing is:

export LOWER_LOGNAME=`echo $LOGNAME | tr "[:upper:]" "[:lower:]"`

This is because a back-tick will cause a new process to be forked. If you have no more processes available, then you will get a FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable.

This is all fine, and as it should be.

The horrible defect in IBM's code is that they fail the shell script without setting an exit code.

I can only offer two suggestions:

1) configure your system so that the system overall and the userid used for the job does not hit a "Max Processes" limit.

2) Complain to IBM that not reporting an exitcode in this case is a vulnerability and that they should fix it.
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

Agreed. We have increased ftp userids so they don't hit the"Max Processes" and raised PMR 39137,999,866 with IBM.
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

IBM have responded on PMR 39137,999,866 they suggest using the 'return' shell command in any scripts
we run where we need to check the exit status. The test I have just done using the following script still seems to give me rc=0 if I do a echo £? after it runs:

#!/bin/sh
set -x
set -e
x=£(command -v /a/b/c)
return

However if I call test.sh using another script as follows I do see a proper return code set (e.g. 126):

#!/bin/sh
set -x
set -e
echo 'Starting fork_script...'
test.sh
return

I've gone back to IBM, but if you have any further thoughts let us know.
dovetail
Site Admin
Posts: 2022
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

I don't see how using "return" helps the problem.

If you wouldn't mind, please email me the details from the PMR to info@dovetail.com
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

email on its way with pmr contents
dovetail
Site Admin
Posts: 2022
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

IBM is trying to say that this is "expected behavior"
Documented in the USS Command Reference for 'sh', the exit status of
the shell defaults to the exit status of the last command run by the
shell. This default can be overridden by explicit use of the exit or
return commands.
In effect, they are saying that if the z/OS Unix fails because it cannot fork that the fork error will not cause the exit code to be set and that they think that this is the correct behavior. This is ridiculous, and we have found that no other shell actually works that way.

Please reply to IBM and ask them to consider the following scenario:

A critical production batch job runs a shell script with BPXBATCH:

// EXEC PGM=BPXBATCH,PARM='SH importantscript.sh'

The "SH" option will run the user's login shell, which will cause the site's /etc/profile shell script to run BEFORE running importantscript.sh.
But (as you have seen yourself) - if /etc/profile does anything to cause a fork() that runs over a site or user limit, this will cause the job step to exit with CC=0 - without ever running importantscript.sh.

Will IBM be updating all of the documentation related to customizing z/OS Unix to make it clear that all shell scripts must be changed (by inserting "false" commands before ALL possible fork or spawns) so that shell scripts do not silently fail with exitCode=0?

I suppose that we have no choice other than to do this for our shell scripts, but this doesn't close all of the holes caused by this problem.
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

PMR 39137,999,866 is now with the IBM z/OS Unix Development team
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

IBM APAR OA47887 has been created to resolve this issue.
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

See URL http://www-01.ibm.com/support/docview.w ... sg1OA47887 . The z/OS 2.1 fix is due 28th August.
njd
Posts: 39
Joined: Fri Apr 24, 2015 5:57 am

Re: Cond Code 0 after failure

Post by njd »

Formal fixes are now available:

R7A0 PSY UA77975 UP15/07/22 P F507
R780 PSY UA77976 UP15/07/22 P F507
R790 PSY UA77977 UP15/07/22 P F507
dovetail
Site Admin
Posts: 2022
Joined: Thu Jul 29, 2004 12:12 pm

Re: Cond Code 0 after failure

Post by dovetail »

Thanks for posting this information.

Have you tested these fixes and confirmed that they solve your problem?
Post Reply