Why does Spark fail with “Detected cartesian product for INNER join between logical plans”?

I am using Spark 2.1.0.

When I execute the following code I'm getting an error from Spark. Why? How to fix it?

val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")

val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")

val i1Idx = i1.withColumn("sourceId", lit(1))

val i2Idx = i2.withColumn("sourceId", lit(2))

val input = i1Idx.union(i2Idx)

val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")

weights.join(input, "sourceId").show

Error:

scala> weights.join(input, "sourceId").show

org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans

Project [_1#34 AS sourceId#39, _2#35 AS weight#40]

+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))

   +- LocalRelation [_1#34, _2#35]

and

Union

:- Project [_1#0 AS a#5, _2#1 AS b#6]

:  +- LocalRelation [_1#0, _2#1]

+- Project [_1#10 AS a#15, _2#11 AS b#16]

   +- LocalRelation [_1#10, _2#11]

Join condition is missing or trivial.

Use the CROSS JOIN syntax to allow cartesian products between these relations.;

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)

  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)

  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)

  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)

  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)

  at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)

  at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)

  at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:636)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:595)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:604)

  ... 48 elided

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

1

@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

– Gevorg
Jun 26 '17 at 17:16

Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

– Jacek Laskowski
Jun 26 '17 at 23:21

add a comment |

I am using Spark 2.1.0.

When I execute the following code I'm getting an error from Spark. Why? How to fix it?

val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")

val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")

val i1Idx = i1.withColumn("sourceId", lit(1))

val i2Idx = i2.withColumn("sourceId", lit(2))

val input = i1Idx.union(i2Idx)

val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")

weights.join(input, "sourceId").show

Error:

scala> weights.join(input, "sourceId").show

org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans

Project [_1#34 AS sourceId#39, _2#35 AS weight#40]

+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))

   +- LocalRelation [_1#34, _2#35]

and

Union

:- Project [_1#0 AS a#5, _2#1 AS b#6]

:  +- LocalRelation [_1#0, _2#1]

+- Project [_1#10 AS a#15, _2#11 AS b#16]

   +- LocalRelation [_1#10, _2#11]

Join condition is missing or trivial.

Use the CROSS JOIN syntax to allow cartesian products between these relations.;

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)

  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)

  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)

  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)

  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)

  at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)

  at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)

  at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:636)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:595)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:604)

  ... 48 elided

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

1

@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

– Gevorg
Jun 26 '17 at 17:16

Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

– Jacek Laskowski
Jun 26 '17 at 23:21

add a comment |

I am using Spark 2.1.0.

When I execute the following code I'm getting an error from Spark. Why? How to fix it?

val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")

val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")

val i1Idx = i1.withColumn("sourceId", lit(1))

val i2Idx = i2.withColumn("sourceId", lit(2))

val input = i1Idx.union(i2Idx)

val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")

weights.join(input, "sourceId").show

Error:

scala> weights.join(input, "sourceId").show

org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans

Project [_1#34 AS sourceId#39, _2#35 AS weight#40]

+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))

   +- LocalRelation [_1#34, _2#35]

and

Union

:- Project [_1#0 AS a#5, _2#1 AS b#6]

:  +- LocalRelation [_1#0, _2#1]

+- Project [_1#10 AS a#15, _2#11 AS b#16]

   +- LocalRelation [_1#10, _2#11]

Join condition is missing or trivial.

Use the CROSS JOIN syntax to allow cartesian products between these relations.;

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)

  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)

  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)

  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)

  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)

  at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)

  at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)

  at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:636)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:595)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:604)

  ... 48 elided

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

I am using Spark 2.1.0.

When I execute the following code I'm getting an error from Spark. Why? How to fix it?

val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")

val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")

val i1Idx = i1.withColumn("sourceId", lit(1))

val i2Idx = i2.withColumn("sourceId", lit(2))

val input = i1Idx.union(i2Idx)

val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")

weights.join(input, "sourceId").show

Error:

scala> weights.join(input, "sourceId").show

org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans

Project [_1#34 AS sourceId#39, _2#35 AS weight#40]

+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))

   +- LocalRelation [_1#34, _2#35]

and

Union

:- Project [_1#0 AS a#5, _2#1 AS b#6]

:  +- LocalRelation [_1#0, _2#1]

+- Project [_1#10 AS a#15, _2#11 AS b#16]

   +- LocalRelation [_1#10, _2#11]

Join condition is missing or trivial.

Use the CROSS JOIN syntax to allow cartesian products between these relations.;

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)

  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)

  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)

  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)

  at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)

  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)

  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)

  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)

  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)

  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)

  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)

  at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)

  at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)

  at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:636)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:595)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:604)

  ... 48 elided

scala apache-spark apache-spark-sql

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

edited Jun 26 '17 at 23:20

Jacek Laskowski

45.6k18135275

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

asked Jun 26 '17 at 14:44

Gevorg

8,9481461113

1

@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

– Gevorg
Jun 26 '17 at 17:16

Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

– Jacek Laskowski
Jun 26 '17 at 23:21

add a comment |

1

@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

– Gevorg
Jun 26 '17 at 17:16

Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

– Jacek Laskowski
Jun 26 '17 at 23:21

@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

– Gevorg
Jun 26 '17 at 17:16

Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

– Jacek Laskowski
Jun 26 '17 at 23:21

add a comment |

2 Answers
2

active

oldest

votes

You can triggers inner join after turning on the flag

spark.conf.set("spark.sql.crossJoin.enabled", "true")

You also could also use the cross join.

weights.crossJoin(input)

or set the Alias as

weights.join(input, input("sourceId")===weights("sourceId"), "cross")

You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1

As you have already used 2.1.1 the issue should have been fixed.

Hope this helps!

edited Jan 19 '18 at 5:25

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04

I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17

which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39

I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48

I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07

|
show 1 more comment

tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.

(I really wished I could also show you the exact change that fixed that in 2.1.1)

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14

6

do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44762353%2fwhy-does-spark-fail-with-detected-cartesian-product-for-inner-join-between-logi%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can triggers inner join after turning on the flag

spark.conf.set("spark.sql.crossJoin.enabled", "true")

You also could also use the cross join.

weights.crossJoin(input)

or set the Alias as

weights.join(input, input("sourceId")===weights("sourceId"), "cross")

You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1

As you have already used 2.1.1 the issue should have been fixed.

Hope this helps!

edited Jan 19 '18 at 5:25

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04

I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17

which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39

I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48

I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07

|
show 1 more comment

You can triggers inner join after turning on the flag

spark.conf.set("spark.sql.crossJoin.enabled", "true")

You also could also use the cross join.

weights.crossJoin(input)

or set the Alias as

weights.join(input, input("sourceId")===weights("sourceId"), "cross")

You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1

As you have already used 2.1.1 the issue should have been fixed.

Hope this helps!

edited Jan 19 '18 at 5:25

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04

I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17

which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39

I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48

I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07

|
show 1 more comment

You can triggers inner join after turning on the flag

spark.conf.set("spark.sql.crossJoin.enabled", "true")

You also could also use the cross join.

weights.crossJoin(input)

or set the Alias as

weights.join(input, input("sourceId")===weights("sourceId"), "cross")

You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1

As you have already used 2.1.1 the issue should have been fixed.

Hope this helps!

edited Jan 19 '18 at 5:25

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

You can triggers inner join after turning on the flag

spark.conf.set("spark.sql.crossJoin.enabled", "true")

You also could also use the cross join.

weights.crossJoin(input)

or set the Alias as

weights.join(input, input("sourceId")===weights("sourceId"), "cross")

You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1

As you have already used 2.1.1 the issue should have been fixed.

Hope this helps!

edited Jan 19 '18 at 5:25

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

edited Jan 19 '18 at 5:25

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

answered Jun 26 '17 at 14:51

Shankar Koirala

12k31642

neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04

I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17

which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39

I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48

I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07

|
show 1 more comment

neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04

I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17

which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39

I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48

I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07

neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04

I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17

which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39

I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48

I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07

|
show 1 more comment

tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.

(I really wished I could also show you the exact change that fixed that in 2.1.1)

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14

6

do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45

add a comment |

tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.

(I really wished I could also show you the exact change that fixed that in 2.1.1)

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14

6

do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45

add a comment |

tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.

(I really wished I could also show you the exact change that fixed that in 2.1.1)

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.

(I really wished I could also show you the exact change that fixed that in 2.1.1)

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

answered Jun 26 '17 at 23:24

Jacek Laskowski

45.6k18135275

Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14

6

do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45

add a comment |

Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14

6

do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45

Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14

do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

IpEc,Hd8WXkK,L6Ce6e 3WcD Ff aJ4NdIo0Eg 2MDOUDLpfgtBTfqGoMJGbhVT3tL

搜尋此網誌

Ndtyjky