Why does Spark fail with “Detected cartesian product for INNER join between logical plans”?












17















I am using Spark 2.1.0.



When I execute the following code I'm getting an error from Spark. Why? How to fix it?



val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show


Error:



scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided









share|improve this question




















  • 1





    @JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

    – Gevorg
    Jun 26 '17 at 17:16











  • Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

    – Jacek Laskowski
    Jun 26 '17 at 23:21


















17















I am using Spark 2.1.0.



When I execute the following code I'm getting an error from Spark. Why? How to fix it?



val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show


Error:



scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided









share|improve this question




















  • 1





    @JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

    – Gevorg
    Jun 26 '17 at 17:16











  • Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

    – Jacek Laskowski
    Jun 26 '17 at 23:21
















17












17








17


4






I am using Spark 2.1.0.



When I execute the following code I'm getting an error from Spark. Why? How to fix it?



val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show


Error:



scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided









share|improve this question
















I am using Spark 2.1.0.



When I execute the following code I'm getting an error from Spark. Why? How to fix it?



val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show


Error:



scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided






scala apache-spark apache-spark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 26 '17 at 23:20









Jacek Laskowski

45.6k18135275




45.6k18135275










asked Jun 26 '17 at 14:44









GevorgGevorg

8,9481461113




8,9481461113








  • 1





    @JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

    – Gevorg
    Jun 26 '17 at 17:16











  • Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

    – Jacek Laskowski
    Jun 26 '17 at 23:21
















  • 1





    @JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

    – Gevorg
    Jun 26 '17 at 17:16











  • Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

    – Jacek Laskowski
    Jun 26 '17 at 23:21










1




1





@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

– Gevorg
Jun 26 '17 at 17:16





@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.

– Gevorg
Jun 26 '17 at 17:16













Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

– Jacek Laskowski
Jun 26 '17 at 23:21







Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").

– Jacek Laskowski
Jun 26 '17 at 23:21














2 Answers
2






active

oldest

votes


















21














You can triggers inner join after turning on the flag



spark.conf.set("spark.sql.crossJoin.enabled", "true")


You also could also use the cross join.



weights.crossJoin(input)


or set the Alias as



weights.join(input, input("sourceId")===weights("sourceId"), "cross")


You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1



As you have already used 2.1.1 the issue should have been fixed.



Hope this helps!






share|improve this answer


























  • neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

    – Gevorg
    Jun 26 '17 at 15:04













  • I have encountered the same exception in same version of and the alias worked for me.

    – Shankar Koirala
    Jun 26 '17 at 15:17











  • which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

    – Shankar Koirala
    Jun 26 '17 at 15:39













  • I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

    – Gevorg
    Jun 26 '17 at 15:48











  • I don't need a cross join. I need an inner join.

    – Gevorg
    Jun 26 '17 at 16:07



















5














tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.



(I really wished I could also show you the exact change that fixed that in 2.1.1)






share|improve this answer
























  • Is there a workaround if we are not able to upgrade to Spark 2.1.1?

    – Bernard Jesop
    Sep 1 '17 at 12:14






  • 6





    do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

    – nir
    Jul 19 '18 at 23:45











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44762353%2fwhy-does-spark-fail-with-detected-cartesian-product-for-inner-join-between-logi%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









21














You can triggers inner join after turning on the flag



spark.conf.set("spark.sql.crossJoin.enabled", "true")


You also could also use the cross join.



weights.crossJoin(input)


or set the Alias as



weights.join(input, input("sourceId")===weights("sourceId"), "cross")


You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1



As you have already used 2.1.1 the issue should have been fixed.



Hope this helps!






share|improve this answer


























  • neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

    – Gevorg
    Jun 26 '17 at 15:04













  • I have encountered the same exception in same version of and the alias worked for me.

    – Shankar Koirala
    Jun 26 '17 at 15:17











  • which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

    – Shankar Koirala
    Jun 26 '17 at 15:39













  • I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

    – Gevorg
    Jun 26 '17 at 15:48











  • I don't need a cross join. I need an inner join.

    – Gevorg
    Jun 26 '17 at 16:07
















21














You can triggers inner join after turning on the flag



spark.conf.set("spark.sql.crossJoin.enabled", "true")


You also could also use the cross join.



weights.crossJoin(input)


or set the Alias as



weights.join(input, input("sourceId")===weights("sourceId"), "cross")


You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1



As you have already used 2.1.1 the issue should have been fixed.



Hope this helps!






share|improve this answer


























  • neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

    – Gevorg
    Jun 26 '17 at 15:04













  • I have encountered the same exception in same version of and the alias worked for me.

    – Shankar Koirala
    Jun 26 '17 at 15:17











  • which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

    – Shankar Koirala
    Jun 26 '17 at 15:39













  • I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

    – Gevorg
    Jun 26 '17 at 15:48











  • I don't need a cross join. I need an inner join.

    – Gevorg
    Jun 26 '17 at 16:07














21












21








21







You can triggers inner join after turning on the flag



spark.conf.set("spark.sql.crossJoin.enabled", "true")


You also could also use the cross join.



weights.crossJoin(input)


or set the Alias as



weights.join(input, input("sourceId")===weights("sourceId"), "cross")


You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1



As you have already used 2.1.1 the issue should have been fixed.



Hope this helps!






share|improve this answer















You can triggers inner join after turning on the flag



spark.conf.set("spark.sql.crossJoin.enabled", "true")


You also could also use the cross join.



weights.crossJoin(input)


or set the Alias as



weights.join(input, input("sourceId")===weights("sourceId"), "cross")


You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1



As you have already used 2.1.1 the issue should have been fixed.



Hope this helps!







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 19 '18 at 5:25

























answered Jun 26 '17 at 14:51









Shankar KoiralaShankar Koirala

12k31642




12k31642













  • neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

    – Gevorg
    Jun 26 '17 at 15:04













  • I have encountered the same exception in same version of and the alias worked for me.

    – Shankar Koirala
    Jun 26 '17 at 15:17











  • which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

    – Shankar Koirala
    Jun 26 '17 at 15:39













  • I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

    – Gevorg
    Jun 26 '17 at 15:48











  • I don't need a cross join. I need an inner join.

    – Gevorg
    Jun 26 '17 at 16:07



















  • neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

    – Gevorg
    Jun 26 '17 at 15:04













  • I have encountered the same exception in same version of and the alias worked for me.

    – Shankar Koirala
    Jun 26 '17 at 15:17











  • which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

    – Shankar Koirala
    Jun 26 '17 at 15:39













  • I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

    – Gevorg
    Jun 26 '17 at 15:48











  • I don't need a cross join. I need an inner join.

    – Gevorg
    Jun 26 '17 at 16:07

















neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04







neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.

– Gevorg
Jun 26 '17 at 15:04















I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17





I have encountered the same exception in same version of and the alias worked for me.

– Shankar Koirala
Jun 26 '17 at 15:17













which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39







which join are you trying to do?do a specific join right, left rightouter, leftouter etc.

– Shankar Koirala
Jun 26 '17 at 15:39















I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48





I need to do an "inner" join and it does not work. No issue if I specify "left_outer"

– Gevorg
Jun 26 '17 at 15:48













I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07





I don't need a cross join. I need an inner join.

– Gevorg
Jun 26 '17 at 16:07













5














tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.



(I really wished I could also show you the exact change that fixed that in 2.1.1)






share|improve this answer
























  • Is there a workaround if we are not able to upgrade to Spark 2.1.1?

    – Bernard Jesop
    Sep 1 '17 at 12:14






  • 6





    do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

    – nir
    Jul 19 '18 at 23:45
















5














tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.



(I really wished I could also show you the exact change that fixed that in 2.1.1)






share|improve this answer
























  • Is there a workaround if we are not able to upgrade to Spark 2.1.1?

    – Bernard Jesop
    Sep 1 '17 at 12:14






  • 6





    do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

    – nir
    Jul 19 '18 at 23:45














5












5








5







tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.



(I really wished I could also show you the exact change that fixed that in 2.1.1)






share|improve this answer













tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.



(I really wished I could also show you the exact change that fixed that in 2.1.1)







share|improve this answer












share|improve this answer



share|improve this answer










answered Jun 26 '17 at 23:24









Jacek LaskowskiJacek Laskowski

45.6k18135275




45.6k18135275













  • Is there a workaround if we are not able to upgrade to Spark 2.1.1?

    – Bernard Jesop
    Sep 1 '17 at 12:14






  • 6





    do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

    – nir
    Jul 19 '18 at 23:45



















  • Is there a workaround if we are not able to upgrade to Spark 2.1.1?

    – Bernard Jesop
    Sep 1 '17 at 12:14






  • 6





    do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

    – nir
    Jul 19 '18 at 23:45

















Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14





Is there a workaround if we are not able to upgrade to Spark 2.1.1?

– Bernard Jesop
Sep 1 '17 at 12:14




6




6





do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45





do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.

– nir
Jul 19 '18 at 23:45


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44762353%2fwhy-does-spark-fail-with-detected-cartesian-product-for-inner-join-between-logi%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Error while running script in elastic search , gateway timeout

Adding quotations to stringified JSON object values