Skip to content

Instantly share code, notes, and snippets.

@pstjohn
Created June 1, 2020 12:21
Show Gist options
  • Save pstjohn/025df83800ea920e5fa114753092138d to your computer and use it in GitHub Desktop.
Save pstjohn/025df83800ea920e5fa114753092138d to your computer and use it in GitHub Desktop.
Job output from #1896
2020-06-01 08:14:35.016515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-01 08:14:36.282819: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2020-06-01 08:14:36.284155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-06-01 08:14:38.383051: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-01 08:14:38.808560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0004:04:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:38.811250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 1 with properties:
pciBusID: 0004:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:38.813916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 2 with properties:
pciBusID: 0004:06:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:38.816509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 3 with properties:
pciBusID: 0035:03:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:38.819103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 4 with properties:
pciBusID: 0035:04:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:38.821699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 5 with properties:
pciBusID: 0035:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:38.821720: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-01 08:14:38.821769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-01 08:14:38.823673: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-01 08:14:38.824287: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-01 08:14:38.826055: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-01 08:14:38.827488: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-01 08:14:38.827527: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-01 08:14:38.858674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0, 1, 2, 3, 4, 5
2020-06-01 08:14:38.877150: I tensorflow/core/platform/profile_utils/cpu_utils.cc:101] CPU Frequency: 3450000000 Hz
2020-06-01 08:14:38.886590: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x13078e6f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-01 08:14:38.886618: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-01 08:14:39.781612: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1302aa9d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-01 08:14:39.781642: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-06-01 08:14:39.781652: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-06-01 08:14:39.781660: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-06-01 08:14:39.781669: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-06-01 08:14:39.781678: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (4): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-06-01 08:14:39.781686: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (5): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-06-01 08:14:39.788309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0004:04:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:39.791019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 1 with properties:
pciBusID: 0004:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:39.793724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 2 with properties:
pciBusID: 0004:06:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:39.796353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 3 with properties:
pciBusID: 0035:03:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:39.799004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 4 with properties:
pciBusID: 0035:04:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:39.801638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 5 with properties:
pciBusID: 0035:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:39.801665: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-01 08:14:39.801683: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-01 08:14:39.801706: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-01 08:14:39.801724: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-01 08:14:39.801741: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-01 08:14:39.801758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-01 08:14:39.801771: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-01 08:14:39.833139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0, 1, 2, 3, 4, 5
2020-06-01 08:14:39.833173: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-01 08:14:44.115023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-01 08:14:44.115066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] 0 1 2 3 4 5
2020-06-01 08:14:44.115077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0: N Y Y Y Y Y
2020-06-01 08:14:44.115086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 1: Y N Y Y Y Y
2020-06-01 08:14:44.115094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 2: Y Y N Y Y Y
2020-06-01 08:14:44.115102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 3: Y Y Y N Y Y
2020-06-01 08:14:44.115110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 4: Y Y Y Y N Y
2020-06-01 08:14:44.115117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 5: Y Y Y Y Y N
2020-06-01 08:14:44.134613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14754 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0004:04:00.0, compute capability: 7.0)
2020-06-01 08:14:44.137521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 14754 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0004:05:00.0, compute capability: 7.0)
2020-06-01 08:14:44.140423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 14754 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0004:06:00.0, compute capability: 7.0)
2020-06-01 08:14:44.143263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 14754 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:03:00.0, compute capability: 7.0)
2020-06-01 08:14:44.146057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 14754 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0)
2020-06-01 08:14:44.148863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 14754 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-16GB, pci bus id: 0035:05:00.0, compute capability: 7.0)
2020-06-01 08:14:44.163348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0004:04:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:44.166002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 1 with properties:
pciBusID: 0004:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:44.168650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 2 with properties:
pciBusID: 0004:06:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:44.171247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 3 with properties:
pciBusID: 0035:03:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:44.173838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 4 with properties:
pciBusID: 0035:04:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:44.176425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 5 with properties:
pciBusID: 0035:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2020-06-01 08:14:44.176452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-01 08:14:44.176473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-01 08:14:44.176500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-01 08:14:44.176520: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-01 08:14:44.176539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-01 08:14:44.176559: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-01 08:14:44.176574: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-01 08:14:44.207708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0, 1, 2, 3, 4, 5
2020-06-01 08:14:44.207842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-01 08:14:44.207853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] 0 1 2 3 4 5
2020-06-01 08:14:44.207862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0: N Y Y Y Y Y
2020-06-01 08:14:44.207870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 1: Y N Y Y Y Y
2020-06-01 08:14:44.207879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 2: Y Y N Y Y Y
2020-06-01 08:14:44.207887: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 3: Y Y Y N Y Y
2020-06-01 08:14:44.207895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 4: Y Y Y Y N Y
2020-06-01 08:14:44.207903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 5: Y Y Y Y Y N
2020-06-01 08:14:44.227005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:0 with 14754 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0004:04:00.0, compute capability: 7.0)
2020-06-01 08:14:44.229687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:1 with 14754 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0004:05:00.0, compute capability: 7.0)
2020-06-01 08:14:44.232360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:2 with 14754 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0004:06:00.0, compute capability: 7.0)
2020-06-01 08:14:44.234965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:3 with 14754 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:03:00.0, compute capability: 7.0)
2020-06-01 08:14:44.237573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:4 with 14754 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0)
2020-06-01 08:14:44.240341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:5 with 14754 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-16GB, pci bus id: 0035:05:00.0, compute capability: 7.0)
2020-06-01 08:14:44.255922: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:300] Initialize GrpcChannelCache for job worker -> {0 -> localhost:2222}
2020-06-01 08:14:44.266449: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:390] Started server with target: grpc://localhost:2222
2020-06-01 08:14:58.502424: F tensorflow/core/framework/tensor_shape.cc:345] Check failed: size >= 0 (-79 vs. 0)
ERROR: One or more process (first noticed rank 0) terminated with signal 6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment