fd_conn_rcv: Connection timed out
fd_conn_rcv: Connection timed out
Hello Everyone,
I am newbie to this forum and just learning how about Co:Z. When transferring files I am seeing the following error after the job has been running longer than two hours:
todsn-client.E.: handleCmdIO: read error on fd_conn_rcv: Connection timed out
If the job is less than two hours it runs fine. Any help would be greatly appreciated.
I am newbie to this forum and just learning how about Co:Z. When transferring files I am seeing the following error after the job has been running longer than two hours:
todsn-client.E.: handleCmdIO: read error on fd_conn_rcv: Connection timed out
If the job is less than two hours it runs fine. Any help would be greatly appreciated.
Re: fd_conn_rcv: Connection timed out
What versions of Co:Z Toolkit for z/OS and the Co:Z Target system tookit are you using?
Re: fd_conn_rcv: Connection timed out
We are using version 1.2.0.
Re: fd_conn_rcv: Connection timed out
and which version of Co:Z on z/OS ?
Re: fd_conn_rcv: Connection timed out
CoZLauncher.N.: version: 1.7.8 2011-01-17
cozagent.N.: version: 1.2.0 2015-05-01
cozagent.N.: version: 1.2.0 2015-05-01
Re: fd_conn_rcv: Connection timed out
There have been several fixes for the launcher since this (old) version of of Co:Z for z/OS:
http://dovetail.com/docs/cozinstall/changes.html
Please retry with the current version (currently 3.6.3).
Note: you can install an alternate version in a different directory and datasets for testing.
http://dovetail.com/docs/cozinstall/changes.html
Please retry with the current version (currently 3.6.3).
Note: you can install an alternate version in a different directory and datasets for testing.
Re: fd_conn_rcv: Connection timed out
We upgraded to 3.6.3 and we still had the same issue when the job took longer than two hours.
Re: fd_conn_rcv: Connection timed out
- What settings are you using in DD:COZCONF (COZCFGD and COZCFG) ?
- What is the target operating system?
- What is the target operating system?
Re: fd_conn_rcv: Connection timed out
COZCONF settngs:
server-path=/app/pp/cozr363/bin/cozserver
server-ports=8040-8059
ssh-tunnel=false
saf-cert=MY-RING:MY-CERT
agent-path=/opt/dovetail/coz/bin/cozagent
server-env-COZ_TRSUB_US-ASCII=ISO8859-1
target-env-COZ_CLIENT_CODEPAGE=ISO8859-1
Target Operating System:
RHEL5
server-path=/app/pp/cozr363/bin/cozserver
server-ports=8040-8059
ssh-tunnel=false
saf-cert=MY-RING:MY-CERT
agent-path=/opt/dovetail/coz/bin/cozagent
server-env-COZ_TRSUB_US-ASCII=ISO8859-1
target-env-COZ_CLIENT_CODEPAGE=ISO8859-1
Target Operating System:
RHEL5
Re: fd_conn_rcv: Connection timed out
We think that the problem is that something in your network path is timing out this particular socket after two hours.
I can't tell from the information that you have provided if the "todsn" that times out is for your file transfer.
More likely it is the internal todsn used for either DD:STDOUT or DD:STDERR redirection. It is common for these to not send any data until a message was issued on the target side to one of those standard handles.
If you are willing to test a beta release, we think that it makes sense to change the socket options on these in such a way to hopefully prevent them from timing out (using TCP_KEEPALIVE).
I can't tell from the information that you have provided if the "todsn" that times out is for your file transfer.
More likely it is the internal todsn used for either DD:STDOUT or DD:STDERR redirection. It is common for these to not send any data until a message was issued on the target side to one of those standard handles.
If you are willing to test a beta release, we think that it makes sense to change the socket options on these in such a way to hopefully prevent them from timing out (using TCP_KEEPALIVE).
Re: fd_conn_rcv: Connection timed out
We tried making the following changes:
zOS changes applied:
TCP_KEEPALIVE increase to 240 minutes
Linux changes applied:
tcp_keepalive_time 14400
tcp_keepalive_intvl 75
tcp_keepalive_probes 90
and we got the following error:
Error message:
todsn(DD:STDOUT)ÝN¨: 69454 bytes read; 587 records/68868 bytes written in 9746.222 seconds (7.126 Bytes/sec).
todsn-client(27282)ÝE¨: handleCmdIO: read error on fd_conn_rcv: Connection timed out
todsn-client(27282)ÝE¨: Error: no exit code received from CoZServer
Ý22:42:49.270522¨ CoZLauncherÝD¨: CoZAgent: completed with RC=103
Ý22:42:49.270590¨ CoZLauncherÝT¨: -> handleAgentCompletion(103)
cozagentÝE¨: STDERR DD Writer(27282) ended with RC=102
Any ideas from the error?
zOS changes applied:
TCP_KEEPALIVE increase to 240 minutes
Linux changes applied:
tcp_keepalive_time 14400
tcp_keepalive_intvl 75
tcp_keepalive_probes 90
and we got the following error:
Error message:
todsn(DD:STDOUT)ÝN¨: 69454 bytes read; 587 records/68868 bytes written in 9746.222 seconds (7.126 Bytes/sec).
todsn-client(27282)ÝE¨: handleCmdIO: read error on fd_conn_rcv: Connection timed out
todsn-client(27282)ÝE¨: Error: no exit code received from CoZServer
Ý22:42:49.270522¨ CoZLauncherÝD¨: CoZAgent: completed with RC=103
Ý22:42:49.270590¨ CoZLauncherÝT¨: -> handleAgentCompletion(103)
cozagentÝE¨: STDERR DD Writer(27282) ended with RC=102
Any ideas from the error?
Re: fd_conn_rcv: Connection timed out
The tcp_keepalive_time needs to be lower than the time that your firewall(s) are timing out the connection. 14400 is probably not low enough. In fact, the default is usually 2 hours so you moved it in the wrong direction. Try something like 600 (10 minutes). Also, I don't believe that keep alives are enabled by default for sockets, so it may not matter since the Co:Z is not explicitly turning it on.
We have tested an enhancement to Co:Z z/OS that will allow you to turn on TCP_KEEPALIVE for the sockets and set the initial interval (tcp_keepliave_time) to whatever you want. This will be much better than changing the kernel.
Note 1:
Until the release is available that includes these enhancements, the only way that I know that I know for sure how to turn on TCP_KEEPALIVE is to use libkeepalive on linux.
See: http://www.tldp.org/HOWTO/html_single/T ... ive-HOWTO/
To use this you would need an executable shell script to run instead of cozagent:
#! /bin/sh
# Use libkeepalive to cause CO:Z sockets to enable TCP_KEEPALIVE with specified intervales
# This script should be chmod 755
#
export LD_PRELOAD=libkeepalive.so
export KEEPCNT=20
export KEEPIDLE=180 # this must be lower than firewall expiration
export KEEPINTVL=60
exec /opt/dovetail/coz/bin/cozagent "$@"
point the "agent-path" property to the full path name of this shell script.
Note 2:
it is also possible that your firewalls are configured so that they ignore TCP keep_alive packets when determining if a connection is expired. If this is the case, then TCP_KEEPALIVE will not help no matter what you set it to. This is probably not the case.
We will also add a feature that sends out actual data packets at the application level, for situations where TCP_KEEPALIVE is not enough.
We have tested an enhancement to Co:Z z/OS that will allow you to turn on TCP_KEEPALIVE for the sockets and set the initial interval (tcp_keepliave_time) to whatever you want. This will be much better than changing the kernel.
Note 1:
Until the release is available that includes these enhancements, the only way that I know that I know for sure how to turn on TCP_KEEPALIVE is to use libkeepalive on linux.
See: http://www.tldp.org/HOWTO/html_single/T ... ive-HOWTO/
To use this you would need an executable shell script to run instead of cozagent:
#! /bin/sh
# Use libkeepalive to cause CO:Z sockets to enable TCP_KEEPALIVE with specified intervales
# This script should be chmod 755
#
export LD_PRELOAD=libkeepalive.so
export KEEPCNT=20
export KEEPIDLE=180 # this must be lower than firewall expiration
export KEEPINTVL=60
exec /opt/dovetail/coz/bin/cozagent "$@"
point the "agent-path" property to the full path name of this shell script.
Note 2:
it is also possible that your firewalls are configured so that they ignore TCP keep_alive packets when determining if a connection is expired. If this is the case, then TCP_KEEPALIVE will not help no matter what you set it to. This is probably not the case.
We will also add a feature that sends out actual data packets at the application level, for situations where TCP_KEEPALIVE is not enough.