Cond Code 0 after failure
Cond Code 0 after failure
Running Co:Z SFTP version: 2.4.1 (5.0p1) 2013-06-24
Looks like we got caught with the "Cond Code 0 after failure" (Topic 1296) issue with SFTPs issuing "FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable." but still returning a RC00.
We have increased MAXPROCUSER by a factor of 5 to get round the issue.
From the last post on the subject it says that:
"We will be updating our sample shell scripts in the next release so that we pre-set an exit code so that if the shell fails with max-procs that it will be set."
Anyone aware of a new release that now contains these updates?
Thanks,
Andrew Davis
Looks like we got caught with the "Cond Code 0 after failure" (Topic 1296) issue with SFTPs issuing "FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable." but still returning a RC00.
We have increased MAXPROCUSER by a factor of 5 to get round the issue.
From the last post on the subject it says that:
"We will be updating our sample shell scripts in the next release so that we pre-set an exit code so that if the shell fails with max-procs that it will be set."
Anyone aware of a new release that now contains these updates?
Thanks,
Andrew Davis
Re: Cond Code 0 after failure
Do you have a "-vvv" trace that shows this failure? If so, please email to info@dovetail.com and we will review.
I do not believe that this error - FSUM7726 (from IBM Ported Tools OpenSSH) is related to the update that we did for sample scripts.
Since the error (FSUM7726) is from IBM code, we suggest that you might want to open a PMR with them.
It is a known issue that fork() can fail with EAGAIN on z/OS, and in our code we retry - but this error is in IBM code.
I do not believe that this error - FSUM7726 (from IBM Ported Tools OpenSSH) is related to the update that we did for sample scripts.
Since the error (FSUM7726) is from IBM code, we suggest that you might want to open a PMR with them.
It is a known issue that fork() can fail with EAGAIN on z/OS, and in our code we retry - but this error is in IBM code.
Re: Cond Code 0 after failure
After review, we have determined that this *is* similar to: http://z.dovetail.com/forum/viewtopic.php?t=1296
In your case, the z/OS shell is executing your shell script and it fails when it tries to fork() cozsftp command. The shell should be exiting with a non-zero return code but this is a bug in the shell in our opinion. Please open a PMR with IBM and refer to 94417,756,000
In your case, the fork is not failing on a $(command), but just a regular command.
Also, since your shell script is failing to fork() cozsftp, this probably means that you are not setting _BPX_SHAREAS=YES in your /etc/profile. You want to do this, since otherwise cozsftp will run in a separate address space and you will not be able to use DDs.
You might be able to work around this defect (as we did in our samples/sftp-batch scripts) by inserting a "false" command on the line before executing cozsftp. This will set the exitCode=1, and if the shell exits without setting the exit code it will at least not be zero.
In your case, the z/OS shell is executing your shell script and it fails when it tries to fork() cozsftp command. The shell should be exiting with a non-zero return code but this is a bug in the shell in our opinion. Please open a PMR with IBM and refer to 94417,756,000
In your case, the fork is not failing on a $(command), but just a regular command.
Also, since your shell script is failing to fork() cozsftp, this probably means that you are not setting _BPX_SHAREAS=YES in your /etc/profile. You want to do this, since otherwise cozsftp will run in a separate address space and you will not be able to use DDs.
You might be able to work around this defect (as we did in our samples/sftp-batch scripts) by inserting a "false" command on the line before executing cozsftp. This will set the exitCode=1, and if the shell exits without setting the exit code it will at least not be zero.
Re: Cond Code 0 after failure
If I add export _BPX_SHAREAS=YES and false to our FTP script just before executing cozsftp I still get an error as follows: "/usr/local/coz/bin/cozsftp 14: FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable. oZBatchÝI¨: returning rc=exitcode=0" and the batch job finsishes rc=0. We have also managed to recreate this problem running cozsftp through BPXBATCH. Interestingly if I run a test with false and set -e I always get rc=1. I'm not sure why this is. Anyway, have raised a pmr with IBM, but if you have any further suggestions please let us know as we have been able to recreate this problem on our test system.
Re: Cond Code 0 after failure
the cozsftp command is actually a shell script that does some configuration and then calls cozsftp_bin.
The line that is failing is:
export LOWER_LOGNAME=`echo $LOGNAME | tr "[:upper:]" "[:lower:]"`
This is because a back-tick will cause a new process to be forked. If you have no more processes available, then you will get a FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable.
This is all fine, and as it should be.
The horrible defect in IBM's code is that they fail the shell script without setting an exit code.
I can only offer two suggestions:
1) configure your system so that the system overall and the userid used for the job does not hit a "Max Processes" limit.
2) Complain to IBM that not reporting an exitcode in this case is a vulnerability and that they should fix it.
The line that is failing is:
export LOWER_LOGNAME=`echo $LOGNAME | tr "[:upper:]" "[:lower:]"`
This is because a back-tick will cause a new process to be forked. If you have no more processes available, then you will get a FSUM7726 cannot fork: reason code = 0b250012: EDC5112I Resource temporarily unavailable.
This is all fine, and as it should be.
The horrible defect in IBM's code is that they fail the shell script without setting an exit code.
I can only offer two suggestions:
1) configure your system so that the system overall and the userid used for the job does not hit a "Max Processes" limit.
2) Complain to IBM that not reporting an exitcode in this case is a vulnerability and that they should fix it.
Re: Cond Code 0 after failure
Agreed. We have increased ftp userids so they don't hit the"Max Processes" and raised PMR 39137,999,866 with IBM.
Re: Cond Code 0 after failure
IBM have responded on PMR 39137,999,866 they suggest using the 'return' shell command in any scripts
we run where we need to check the exit status. The test I have just done using the following script still seems to give me rc=0 if I do a echo £? after it runs:
#!/bin/sh
set -x
set -e
x=£(command -v /a/b/c)
return
However if I call test.sh using another script as follows I do see a proper return code set (e.g. 126):
#!/bin/sh
set -x
set -e
echo 'Starting fork_script...'
test.sh
return
I've gone back to IBM, but if you have any further thoughts let us know.
we run where we need to check the exit status. The test I have just done using the following script still seems to give me rc=0 if I do a echo £? after it runs:
#!/bin/sh
set -x
set -e
x=£(command -v /a/b/c)
return
However if I call test.sh using another script as follows I do see a proper return code set (e.g. 126):
#!/bin/sh
set -x
set -e
echo 'Starting fork_script...'
test.sh
return
I've gone back to IBM, but if you have any further thoughts let us know.
Re: Cond Code 0 after failure
I don't see how using "return" helps the problem.
If you wouldn't mind, please email me the details from the PMR to info@dovetail.com
If you wouldn't mind, please email me the details from the PMR to info@dovetail.com
Re: Cond Code 0 after failure
email on its way with pmr contents
Re: Cond Code 0 after failure
IBM is trying to say that this is "expected behavior"
Please reply to IBM and ask them to consider the following scenario:
A critical production batch job runs a shell script with BPXBATCH:
// EXEC PGM=BPXBATCH,PARM='SH importantscript.sh'
The "SH" option will run the user's login shell, which will cause the site's /etc/profile shell script to run BEFORE running importantscript.sh.
But (as you have seen yourself) - if /etc/profile does anything to cause a fork() that runs over a site or user limit, this will cause the job step to exit with CC=0 - without ever running importantscript.sh.
Will IBM be updating all of the documentation related to customizing z/OS Unix to make it clear that all shell scripts must be changed (by inserting "false" commands before ALL possible fork or spawns) so that shell scripts do not silently fail with exitCode=0?
I suppose that we have no choice other than to do this for our shell scripts, but this doesn't close all of the holes caused by this problem.
In effect, they are saying that if the z/OS Unix fails because it cannot fork that the fork error will not cause the exit code to be set and that they think that this is the correct behavior. This is ridiculous, and we have found that no other shell actually works that way.Documented in the USS Command Reference for 'sh', the exit status of
the shell defaults to the exit status of the last command run by the
shell. This default can be overridden by explicit use of the exit or
return commands.
Please reply to IBM and ask them to consider the following scenario:
A critical production batch job runs a shell script with BPXBATCH:
// EXEC PGM=BPXBATCH,PARM='SH importantscript.sh'
The "SH" option will run the user's login shell, which will cause the site's /etc/profile shell script to run BEFORE running importantscript.sh.
But (as you have seen yourself) - if /etc/profile does anything to cause a fork() that runs over a site or user limit, this will cause the job step to exit with CC=0 - without ever running importantscript.sh.
Will IBM be updating all of the documentation related to customizing z/OS Unix to make it clear that all shell scripts must be changed (by inserting "false" commands before ALL possible fork or spawns) so that shell scripts do not silently fail with exitCode=0?
I suppose that we have no choice other than to do this for our shell scripts, but this doesn't close all of the holes caused by this problem.
Re: Cond Code 0 after failure
PMR 39137,999,866 is now with the IBM z/OS Unix Development team
Re: Cond Code 0 after failure
IBM APAR OA47887 has been created to resolve this issue.
Re: Cond Code 0 after failure
See URL http://www-01.ibm.com/support/docview.w ... sg1OA47887 . The z/OS 2.1 fix is due 28th August.
Re: Cond Code 0 after failure
Formal fixes are now available:
R7A0 PSY UA77975 UP15/07/22 P F507
R780 PSY UA77976 UP15/07/22 P F507
R790 PSY UA77977 UP15/07/22 P F507
R7A0 PSY UA77975 UP15/07/22 P F507
R780 PSY UA77976 UP15/07/22 P F507
R790 PSY UA77977 UP15/07/22 P F507
Re: Cond Code 0 after failure
Thanks for posting this information.
Have you tested these fixes and confirmed that they solve your problem?
Have you tested these fixes and confirmed that they solve your problem?