Skip to content
This repository was archived by the owner on Jul 16, 2020. It is now read-only.
This repository was archived by the owner on Jul 16, 2020. It is now read-only.

"Frame length should be positive" problem in XGBoost with CPU (Mortgage-large) #71

@peizhaoliu

Description

@peizhaoliu

dear author,

I came across this article "https://github.com/rapidsai/spark-examples/blob/master/getting-started-guides/on-prem-cluster/standalone-scala.md".
When i launch distributed training without GPUs (tree method hist), the parameters setting by following: "--num-executors 1 --executor-cores 19 --conf spark.cores.max=19 --conf spark.task.cpus=1 --class ai.rapids.spark.examples.mortgage.CPUMain -numWorkers=19 -treeMethod=hist"
However, tasks of the stage "foreachPartition at XGBoost.scala:703" always blocked in "running". In a few hours after submitted the job, we obtained the feeback:
java.lang.IllegalArgumentException: Frame length should be positive: -9223371863126827765 at org.spark_project.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:134) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:81) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)

Could you please come up some tips about this issue? Thanks

sincerely

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions