Why does Spark fail with “Detected cartesian product for INNER join between logical plans”?
I am using Spark 2.1.0.
When I execute the following code I'm getting an error from Spark. Why? How to fix it?
val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show
Error:
scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided
scala apache-spark apache-spark-sql
add a comment |
I am using Spark 2.1.0.
When I execute the following code I'm getting an error from Spark. Why? How to fix it?
val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show
Error:
scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided
scala apache-spark apache-spark-sql
1
@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.
– Gevorg
Jun 26 '17 at 17:16
Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").
– Jacek Laskowski
Jun 26 '17 at 23:21
add a comment |
I am using Spark 2.1.0.
When I execute the following code I'm getting an error from Spark. Why? How to fix it?
val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show
Error:
scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided
scala apache-spark apache-spark-sql
I am using Spark 2.1.0.
When I execute the following code I'm getting an error from Spark. Why? How to fix it?
val i1 = Seq(("a", "string"), ("another", "string"), ("last", "one")).toDF("a", "b")
val i2 = Seq(("one", "string"), ("two", "strings")).toDF("a", "b")
val i1Idx = i1.withColumn("sourceId", lit(1))
val i2Idx = i2.withColumn("sourceId", lit(2))
val input = i1Idx.union(i2Idx)
val weights = Seq((1, 0.6), (2, 0.4)).toDF("sourceId", "weight")
weights.join(input, "sourceId").show
Error:
scala> weights.join(input, "sourceId").show
org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans
Project [_1#34 AS sourceId#39, _2#35 AS weight#40]
+- Filter (((1 <=> _1#34) || (2 <=> _1#34)) && (_1#34 = 1))
+- LocalRelation [_1#34, _2#35]
and
Union
:- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1011)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$19.applyOrElse(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1008)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:993)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2791)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:636)
at org.apache.spark.sql.Dataset.show(Dataset.scala:595)
at org.apache.spark.sql.Dataset.show(Dataset.scala:604)
... 48 elided
scala apache-spark apache-spark-sql
scala apache-spark apache-spark-sql
edited Jun 26 '17 at 23:20
Jacek Laskowski
45.6k18135275
45.6k18135275
asked Jun 26 '17 at 14:44
GevorgGevorg
8,9481461113
8,9481461113
1
@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.
– Gevorg
Jun 26 '17 at 17:16
Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").
– Jacek Laskowski
Jun 26 '17 at 23:21
add a comment |
1
@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.
– Gevorg
Jun 26 '17 at 17:16
Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").
– Jacek Laskowski
Jun 26 '17 at 23:21
1
1
@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.
– Gevorg
Jun 26 '17 at 17:16
@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.
– Gevorg
Jun 26 '17 at 17:16
Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").
– Jacek Laskowski
Jun 26 '17 at 23:21
Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").
– Jacek Laskowski
Jun 26 '17 at 23:21
add a comment |
2 Answers
2
active
oldest
votes
You can triggers inner join after turning on the flag
spark.conf.set("spark.sql.crossJoin.enabled", "true")
You also could also use the cross join.
weights.crossJoin(input)
or set the Alias as
weights.join(input, input("sourceId")===weights("sourceId"), "cross")
You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1
As you have already used 2.1.1 the issue should have been fixed.
Hope this helps!
neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.
– Gevorg
Jun 26 '17 at 15:04
I have encountered the same exception in same version of and the alias worked for me.
– Shankar Koirala
Jun 26 '17 at 15:17
which join are you trying to do?do a specific join right, left rightouter, leftouter etc.
– Shankar Koirala
Jun 26 '17 at 15:39
I need to do an "inner" join and it does not work. No issue if I specify "left_outer"
– Gevorg
Jun 26 '17 at 15:48
I don't need a cross join. I need an inner join.
– Gevorg
Jun 26 '17 at 16:07
|
show 1 more comment
tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.
(I really wished I could also show you the exact change that fixed that in 2.1.1)
Is there a workaround if we are not able to upgrade to Spark 2.1.1?
– Bernard Jesop
Sep 1 '17 at 12:14
6
do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.
– nir
Jul 19 '18 at 23:45
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44762353%2fwhy-does-spark-fail-with-detected-cartesian-product-for-inner-join-between-logi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can triggers inner join after turning on the flag
spark.conf.set("spark.sql.crossJoin.enabled", "true")
You also could also use the cross join.
weights.crossJoin(input)
or set the Alias as
weights.join(input, input("sourceId")===weights("sourceId"), "cross")
You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1
As you have already used 2.1.1 the issue should have been fixed.
Hope this helps!
neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.
– Gevorg
Jun 26 '17 at 15:04
I have encountered the same exception in same version of and the alias worked for me.
– Shankar Koirala
Jun 26 '17 at 15:17
which join are you trying to do?do a specific join right, left rightouter, leftouter etc.
– Shankar Koirala
Jun 26 '17 at 15:39
I need to do an "inner" join and it does not work. No issue if I specify "left_outer"
– Gevorg
Jun 26 '17 at 15:48
I don't need a cross join. I need an inner join.
– Gevorg
Jun 26 '17 at 16:07
|
show 1 more comment
You can triggers inner join after turning on the flag
spark.conf.set("spark.sql.crossJoin.enabled", "true")
You also could also use the cross join.
weights.crossJoin(input)
or set the Alias as
weights.join(input, input("sourceId")===weights("sourceId"), "cross")
You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1
As you have already used 2.1.1 the issue should have been fixed.
Hope this helps!
neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.
– Gevorg
Jun 26 '17 at 15:04
I have encountered the same exception in same version of and the alias worked for me.
– Shankar Koirala
Jun 26 '17 at 15:17
which join are you trying to do?do a specific join right, left rightouter, leftouter etc.
– Shankar Koirala
Jun 26 '17 at 15:39
I need to do an "inner" join and it does not work. No issue if I specify "left_outer"
– Gevorg
Jun 26 '17 at 15:48
I don't need a cross join. I need an inner join.
– Gevorg
Jun 26 '17 at 16:07
|
show 1 more comment
You can triggers inner join after turning on the flag
spark.conf.set("spark.sql.crossJoin.enabled", "true")
You also could also use the cross join.
weights.crossJoin(input)
or set the Alias as
weights.join(input, input("sourceId")===weights("sourceId"), "cross")
You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1
As you have already used 2.1.1 the issue should have been fixed.
Hope this helps!
You can triggers inner join after turning on the flag
spark.conf.set("spark.sql.crossJoin.enabled", "true")
You also could also use the cross join.
weights.crossJoin(input)
or set the Alias as
weights.join(input, input("sourceId")===weights("sourceId"), "cross")
You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1
As you have already used 2.1.1 the issue should have been fixed.
Hope this helps!
edited Jan 19 '18 at 5:25
answered Jun 26 '17 at 14:51
Shankar KoiralaShankar Koirala
12k31642
12k31642
neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.
– Gevorg
Jun 26 '17 at 15:04
I have encountered the same exception in same version of and the alias worked for me.
– Shankar Koirala
Jun 26 '17 at 15:17
which join are you trying to do?do a specific join right, left rightouter, leftouter etc.
– Shankar Koirala
Jun 26 '17 at 15:39
I need to do an "inner" join and it does not work. No issue if I specify "left_outer"
– Gevorg
Jun 26 '17 at 15:48
I don't need a cross join. I need an inner join.
– Gevorg
Jun 26 '17 at 16:07
|
show 1 more comment
neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.
– Gevorg
Jun 26 '17 at 15:04
I have encountered the same exception in same version of and the alias worked for me.
– Shankar Koirala
Jun 26 '17 at 15:17
which join are you trying to do?do a specific join right, left rightouter, leftouter etc.
– Shankar Koirala
Jun 26 '17 at 15:39
I need to do an "inner" join and it does not work. No issue if I specify "left_outer"
– Gevorg
Jun 26 '17 at 15:48
I don't need a cross join. I need an inner join.
– Gevorg
Jun 26 '17 at 16:07
neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.
– Gevorg
Jun 26 '17 at 15:04
neither of the alternatives work for me... same exact exception. The issue that you pointed out is supposed to be already fixed in the version of Spark that I'm using.
– Gevorg
Jun 26 '17 at 15:04
I have encountered the same exception in same version of and the alias worked for me.
– Shankar Koirala
Jun 26 '17 at 15:17
I have encountered the same exception in same version of and the alias worked for me.
– Shankar Koirala
Jun 26 '17 at 15:17
which join are you trying to do?do a specific join right, left rightouter, leftouter etc.
– Shankar Koirala
Jun 26 '17 at 15:39
which join are you trying to do?do a specific join right, left rightouter, leftouter etc.
– Shankar Koirala
Jun 26 '17 at 15:39
I need to do an "inner" join and it does not work. No issue if I specify "left_outer"
– Gevorg
Jun 26 '17 at 15:48
I need to do an "inner" join and it does not work. No issue if I specify "left_outer"
– Gevorg
Jun 26 '17 at 15:48
I don't need a cross join. I need an inner join.
– Gevorg
Jun 26 '17 at 16:07
I don't need a cross join. I need an inner join.
– Gevorg
Jun 26 '17 at 16:07
|
show 1 more comment
tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.
(I really wished I could also show you the exact change that fixed that in 2.1.1)
Is there a workaround if we are not able to upgrade to Spark 2.1.1?
– Bernard Jesop
Sep 1 '17 at 12:14
6
do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.
– nir
Jul 19 '18 at 23:45
add a comment |
tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.
(I really wished I could also show you the exact change that fixed that in 2.1.1)
Is there a workaround if we are not able to upgrade to Spark 2.1.1?
– Bernard Jesop
Sep 1 '17 at 12:14
6
do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.
– nir
Jul 19 '18 at 23:45
add a comment |
tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.
(I really wished I could also show you the exact change that fixed that in 2.1.1)
tl;dr Upgrade to Spark 2.1.1. It's an issue in Spark that was fixed.
(I really wished I could also show you the exact change that fixed that in 2.1.1)
answered Jun 26 '17 at 23:24
Jacek LaskowskiJacek Laskowski
45.6k18135275
45.6k18135275
Is there a workaround if we are not able to upgrade to Spark 2.1.1?
– Bernard Jesop
Sep 1 '17 at 12:14
6
do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.
– nir
Jul 19 '18 at 23:45
add a comment |
Is there a workaround if we are not able to upgrade to Spark 2.1.1?
– Bernard Jesop
Sep 1 '17 at 12:14
6
do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.
– nir
Jul 19 '18 at 23:45
Is there a workaround if we are not able to upgrade to Spark 2.1.1?
– Bernard Jesop
Sep 1 '17 at 12:14
Is there a workaround if we are not able to upgrade to Spark 2.1.1?
– Bernard Jesop
Sep 1 '17 at 12:14
6
6
do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.
– nir
Jul 19 '18 at 23:45
do you know how issues is fixed? I have spark 2.2 and I am still getting this issue where spark misinterprets regular join as a cartesian product.
– nir
Jul 19 '18 at 23:45
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44762353%2fwhy-does-spark-fail-with-detected-cartesian-product-for-inner-join-between-logi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
@JacekLaskowski I just downloaded Spark 2.1.0 from the official website and it presents the same issue (in the local shell). Spark 2.1.1 works fine instead.
– Gevorg
Jun 26 '17 at 17:16
Confirmed. I could reproduce it with 2.1.0 too. Yes, 2.1.1 works fine. Scala does not matter as I use the official version that's built with Scala 2.11.8 (and that's why I removed it as a "noise").
– Jacek Laskowski
Jun 26 '17 at 23:21