Writing partitioned dataset to HDFS/S3 with _SUCCESS file in each partition

when writing a partitioned dataset to HDFS/S3, a _SUCCESS file is written to the output directory upon successful completion. I'm curious if there is way to get a _SUCCESS file written to each partitioned directory ?

asked Apr 26 '18 at 20:09

femibyte

91721537

Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.

– jww
May 2 '18 at 4:44

@jww It is a perfectly valid question and definitely not something that could be answered on Super User or Unix. Context might not be obvious without the context, but it clear if you consider the tags.

– user6910411
May 2 '18 at 16:31

@femibyte Why would you need that? _SUCCESS marks completion of the job and no partition can be considered completed, until a whole job is. Is there any particular use case here?

– user6910411
May 2 '18 at 16:33

I want to be able to use the _SUCCESS flag as an indicator in a luigi workflow where the pipeline writes to a new daily s3 partition. Because the location is partitioned, the _SUCCESS flag is created at the "folder" above rather than the newly created partitioned directory itself.

– femibyte
May 2 '18 at 18:15

I'm facing this problem for a daily ETL. I need to to be able to keep a record of what ETLs succeeded, even while multiple ETLs may run at the same time or out of chronological order. Would love to see an elegant solution.

– matmat
Nov 14 '18 at 0:28

add a comment |

asked Apr 26 '18 at 20:09

femibyte

91721537

Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.

– jww
May 2 '18 at 4:44

@jww It is a perfectly valid question and definitely not something that could be answered on Super User or Unix. Context might not be obvious without the context, but it clear if you consider the tags.

– user6910411
May 2 '18 at 16:31

@femibyte Why would you need that? _SUCCESS marks completion of the job and no partition can be considered completed, until a whole job is. Is there any particular use case here?

– user6910411
May 2 '18 at 16:33

I want to be able to use the _SUCCESS flag as an indicator in a luigi workflow where the pipeline writes to a new daily s3 partition. Because the location is partitioned, the _SUCCESS flag is created at the "folder" above rather than the newly created partitioned directory itself.

– femibyte
May 2 '18 at 18:15

I'm facing this problem for a daily ETL. I need to to be able to keep a record of what ETLs succeeded, even while multiple ETLs may run at the same time or out of chronological order. Would love to see an elegant solution.

– matmat
Nov 14 '18 at 0:28

add a comment |

asked Apr 26 '18 at 20:09

femibyte

91721537

apache-spark pyspark hdfs

asked Apr 26 '18 at 20:09

femibyte

91721537

asked Apr 26 '18 at 20:09

femibyte

91721537

asked Apr 26 '18 at 20:09

femibyte

91721537

asked Apr 26 '18 at 20:09

femibyte

91721537

asked Apr 26 '18 at 20:09

femibyte

91721537

Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.

– jww
May 2 '18 at 4:44

@jww It is a perfectly valid question and definitely not something that could be answered on Super User or Unix. Context might not be obvious without the context, but it clear if you consider the tags.

– user6910411
May 2 '18 at 16:31

@femibyte Why would you need that? _SUCCESS marks completion of the job and no partition can be considered completed, until a whole job is. Is there any particular use case here?

– user6910411
May 2 '18 at 16:33

I want to be able to use the _SUCCESS flag as an indicator in a luigi workflow where the pipeline writes to a new daily s3 partition. Because the location is partitioned, the _SUCCESS flag is created at the "folder" above rather than the newly created partitioned directory itself.

– femibyte
May 2 '18 at 18:15

I'm facing this problem for a daily ETL. I need to to be able to keep a record of what ETLs succeeded, even while multiple ETLs may run at the same time or out of chronological order. Would love to see an elegant solution.

– matmat
Nov 14 '18 at 0:28

add a comment |

Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.

– jww
May 2 '18 at 4:44

@jww It is a perfectly valid question and definitely not something that could be answered on Super User or Unix. Context might not be obvious without the context, but it clear if you consider the tags.

– user6910411
May 2 '18 at 16:31

@femibyte Why would you need that? _SUCCESS marks completion of the job and no partition can be considered completed, until a whole job is. Is there any particular use case here?

– user6910411
May 2 '18 at 16:33

I want to be able to use the _SUCCESS flag as an indicator in a luigi workflow where the pipeline writes to a new daily s3 partition. Because the location is partitioned, the _SUCCESS flag is created at the "folder" above rather than the newly created partitioned directory itself.

– femibyte
May 2 '18 at 18:15

I'm facing this problem for a daily ETL. I need to to be able to keep a record of what ETLs succeeded, even while multiple ETLs may run at the same time or out of chronological order. Would love to see an elegant solution.

– matmat
Nov 14 '18 at 0:28

Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.

– jww
May 2 '18 at 4:44

@jww It is a perfectly valid question and definitely not something that could be answered on Super User or Unix. Context might not be obvious without the context, but it clear if you consider the tags.

– user6910411
May 2 '18 at 16:31

@femibyte Why would you need that? _SUCCESS marks completion of the job and no partition can be considered completed, until a whole job is. Is there any particular use case here?

– user6910411
May 2 '18 at 16:33

I want to be able to use the _SUCCESS flag as an indicator in a luigi workflow where the pipeline writes to a new daily s3 partition. Because the location is partitioned, the _SUCCESS flag is created at the "folder" above rather than the newly created partitioned directory itself.

– femibyte
May 2 '18 at 18:15

I'm facing this problem for a daily ETL. I need to to be able to keep a record of what ETLs succeeded, even while multiple ETLs may run at the same time or out of chronological order. Would love to see an elegant solution.

– matmat
Nov 14 '18 at 0:28

add a comment |

1 Answer
1

active

oldest

votes

For the time being, you may be able to get your desired result by writing out files directly to path/to/table/partition_key1=foo/partition_key2=bar and not using the Parquet writer's partitionBy argument.

FWIW, I also believe that _SUCCESS files should be written out to every partition, especially given that SPARK-13207 and SPARK-20236 have been resolved.

answered Nov 14 '18 at 0:49

matmat

41537

I've filed a bug report for this.

– matmat
Nov 14 '18 at 1:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50051103%2fwriting-partitioned-dataset-to-hdfs-s3-with-success-file-in-each-partition%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

FWIW, I also believe that _SUCCESS files should be written out to every partition, especially given that SPARK-13207 and SPARK-20236 have been resolved.

answered Nov 14 '18 at 0:49

matmat

41537

I've filed a bug report for this.

– matmat
Nov 14 '18 at 1:47

add a comment |

FWIW, I also believe that _SUCCESS files should be written out to every partition, especially given that SPARK-13207 and SPARK-20236 have been resolved.

answered Nov 14 '18 at 0:49

matmat

41537

I've filed a bug report for this.

– matmat
Nov 14 '18 at 1:47

add a comment |

FWIW, I also believe that _SUCCESS files should be written out to every partition, especially given that SPARK-13207 and SPARK-20236 have been resolved.

answered Nov 14 '18 at 0:49

matmat

41537

FWIW, I also believe that _SUCCESS files should be written out to every partition, especially given that SPARK-13207 and SPARK-20236 have been resolved.

answered Nov 14 '18 at 0:49

matmat

41537

answered Nov 14 '18 at 0:49

matmat

41537

answered Nov 14 '18 at 0:49

matmat

41537

answered Nov 14 '18 at 0:49

matmat

41537

I've filed a bug report for this.

– matmat
Nov 14 '18 at 1:47

add a comment |

I've filed a bug report for this.

– matmat
Nov 14 '18 at 1:47

I've filed a bug report for this.

– matmat
Nov 14 '18 at 1:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

LC9ERzJVhzKM5 MvGFV6d yL0 3,5DkePotdCiN2bVh3H2vAyIzHGvLTCzmHc

搜尋此網誌

Ndtyjky