Object storage for a web application











up vote
5
down vote

favorite












I am currently working on a website where, roughly 40 million documents and images should be served to it's users. I need suggestions on which method is the most suitable for storing content with subject to these requirements.




  • System should be highly available, scale-able and durable.

  • Files have to be stored permanently and users should be able to modify them.

  • Due to client restrictions, 3rd party object storage providers such as Amazon S3 and CDNs are not suitable.

  • File size of content can vary from 1 MB to 30 MB. (However about 90% of the files would be less than 2 MB)

  • Content retrieval latency is not much of a problem. Therefore indexing or caching is not very important.


I did some research and found out about the following solutions;




  • Storing content as BLOBs in databases.

  • Using GridFS to chunk and store content.

  • Storing content in a file server in directories using a hash and storing the metadata in a database.

  • Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.


The website is developed using PHP and Couchbase Community Edition is used as the database.



I would really appreciate any input.



Thank you.










share|improve this question




















  • 1




    That's roughly 120 TB of data, where is this stored now?
    – Daan
    Oct 30 at 10:02






  • 3




    I calculated that it could be around a 100 TB of data. That's a lot. I would go for a your third or fourth solution. File systems are good at retrieving files, and databases are good at keeping track of those files. Don't put the files themselves in a database, that would be a monstrosity.
    – KIKO Software
    Oct 30 at 10:02












  • @Daan The feature to upload content is not yet implemented.
    – Panduka
    Oct 30 at 10:11












  • @KIKOSoftware Thanks for the advice.
    – Panduka
    Oct 30 at 10:14















up vote
5
down vote

favorite












I am currently working on a website where, roughly 40 million documents and images should be served to it's users. I need suggestions on which method is the most suitable for storing content with subject to these requirements.




  • System should be highly available, scale-able and durable.

  • Files have to be stored permanently and users should be able to modify them.

  • Due to client restrictions, 3rd party object storage providers such as Amazon S3 and CDNs are not suitable.

  • File size of content can vary from 1 MB to 30 MB. (However about 90% of the files would be less than 2 MB)

  • Content retrieval latency is not much of a problem. Therefore indexing or caching is not very important.


I did some research and found out about the following solutions;




  • Storing content as BLOBs in databases.

  • Using GridFS to chunk and store content.

  • Storing content in a file server in directories using a hash and storing the metadata in a database.

  • Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.


The website is developed using PHP and Couchbase Community Edition is used as the database.



I would really appreciate any input.



Thank you.










share|improve this question




















  • 1




    That's roughly 120 TB of data, where is this stored now?
    – Daan
    Oct 30 at 10:02






  • 3




    I calculated that it could be around a 100 TB of data. That's a lot. I would go for a your third or fourth solution. File systems are good at retrieving files, and databases are good at keeping track of those files. Don't put the files themselves in a database, that would be a monstrosity.
    – KIKO Software
    Oct 30 at 10:02












  • @Daan The feature to upload content is not yet implemented.
    – Panduka
    Oct 30 at 10:11












  • @KIKOSoftware Thanks for the advice.
    – Panduka
    Oct 30 at 10:14













up vote
5
down vote

favorite









up vote
5
down vote

favorite











I am currently working on a website where, roughly 40 million documents and images should be served to it's users. I need suggestions on which method is the most suitable for storing content with subject to these requirements.




  • System should be highly available, scale-able and durable.

  • Files have to be stored permanently and users should be able to modify them.

  • Due to client restrictions, 3rd party object storage providers such as Amazon S3 and CDNs are not suitable.

  • File size of content can vary from 1 MB to 30 MB. (However about 90% of the files would be less than 2 MB)

  • Content retrieval latency is not much of a problem. Therefore indexing or caching is not very important.


I did some research and found out about the following solutions;




  • Storing content as BLOBs in databases.

  • Using GridFS to chunk and store content.

  • Storing content in a file server in directories using a hash and storing the metadata in a database.

  • Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.


The website is developed using PHP and Couchbase Community Edition is used as the database.



I would really appreciate any input.



Thank you.










share|improve this question















I am currently working on a website where, roughly 40 million documents and images should be served to it's users. I need suggestions on which method is the most suitable for storing content with subject to these requirements.




  • System should be highly available, scale-able and durable.

  • Files have to be stored permanently and users should be able to modify them.

  • Due to client restrictions, 3rd party object storage providers such as Amazon S3 and CDNs are not suitable.

  • File size of content can vary from 1 MB to 30 MB. (However about 90% of the files would be less than 2 MB)

  • Content retrieval latency is not much of a problem. Therefore indexing or caching is not very important.


I did some research and found out about the following solutions;




  • Storing content as BLOBs in databases.

  • Using GridFS to chunk and store content.

  • Storing content in a file server in directories using a hash and storing the metadata in a database.

  • Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.


The website is developed using PHP and Couchbase Community Edition is used as the database.



I would really appreciate any input.



Thank you.







blob gridfs object-storage glusterfs distributed-filesystem






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 6:55









Vitaly Isaev

2,21222044




2,21222044










asked Oct 30 at 9:54









Panduka

616




616








  • 1




    That's roughly 120 TB of data, where is this stored now?
    – Daan
    Oct 30 at 10:02






  • 3




    I calculated that it could be around a 100 TB of data. That's a lot. I would go for a your third or fourth solution. File systems are good at retrieving files, and databases are good at keeping track of those files. Don't put the files themselves in a database, that would be a monstrosity.
    – KIKO Software
    Oct 30 at 10:02












  • @Daan The feature to upload content is not yet implemented.
    – Panduka
    Oct 30 at 10:11












  • @KIKOSoftware Thanks for the advice.
    – Panduka
    Oct 30 at 10:14














  • 1




    That's roughly 120 TB of data, where is this stored now?
    – Daan
    Oct 30 at 10:02






  • 3




    I calculated that it could be around a 100 TB of data. That's a lot. I would go for a your third or fourth solution. File systems are good at retrieving files, and databases are good at keeping track of those files. Don't put the files themselves in a database, that would be a monstrosity.
    – KIKO Software
    Oct 30 at 10:02












  • @Daan The feature to upload content is not yet implemented.
    – Panduka
    Oct 30 at 10:11












  • @KIKOSoftware Thanks for the advice.
    – Panduka
    Oct 30 at 10:14








1




1




That's roughly 120 TB of data, where is this stored now?
– Daan
Oct 30 at 10:02




That's roughly 120 TB of data, where is this stored now?
– Daan
Oct 30 at 10:02




3




3




I calculated that it could be around a 100 TB of data. That's a lot. I would go for a your third or fourth solution. File systems are good at retrieving files, and databases are good at keeping track of those files. Don't put the files themselves in a database, that would be a monstrosity.
– KIKO Software
Oct 30 at 10:02






I calculated that it could be around a 100 TB of data. That's a lot. I would go for a your third or fourth solution. File systems are good at retrieving files, and databases are good at keeping track of those files. Don't put the files themselves in a database, that would be a monstrosity.
– KIKO Software
Oct 30 at 10:02














@Daan The feature to upload content is not yet implemented.
– Panduka
Oct 30 at 10:11






@Daan The feature to upload content is not yet implemented.
– Panduka
Oct 30 at 10:11














@KIKOSoftware Thanks for the advice.
– Panduka
Oct 30 at 10:14




@KIKOSoftware Thanks for the advice.
– Panduka
Oct 30 at 10:14












2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










I have been working on a similar system for last two years, the work is still in progress. However, requirements are slightly different from yours: modifications are not possible (I will try to explain why later), file sizes fall in range from several bytes to several megabytes, and, the most important one, the deduplication, which should be implemented both on the document and block levels. If two different users upload the same file to the storage, the only copy of the file should be kept. Also if two different files partially intersect with each other, it's necessary to store the only copy of the common part of these files.



But let's focus on your requirements, so deduplication is not the case. First of all, high availability implies replication. You'll have to store your file in several replicas (typically 2 or 3, but there are techniques to decrease data parity) on independent machines in order to stay alive in case if one of the storage servers in your backend dies. Also, taking into account the estimation of the data amount, it's clear that all your data just won't fit into a single server, so vertical scaling is not possible and you have to consider partitioning. Finally, you need to take into account concurrency control to avoid race conditions when two different clients are trying to write or update the same data simultaneously. This topic is close to the concept of transactions (I don't mean ACID literally, but something close). So, to summarize, these facts mean that you're are actually looking for distributed database designed to store BLOBs.



On of the biggest problems in distributed systems is difficulties with global state of the system. In brief, there are two approaches:




  1. Choose leader that will communicate with other peers and maintain global state of the distributed system. This approach provides strong consistency and linearizability guarantees. The main disadvantage is that in this case leader becomes the single point of failure. If leader dies, either some observer must assign leader role to one of the replicas (common case for master-slave replication in RDBMS world), or remaining peers need to elect new one (algorithms like Paxos and Raft are designed to target this issue). Anyway, almost whole incoming system traffic goes through the leader. This leads to the "hot spots" in backend: the situation when CPU and IO costs are unevenly distributed across the system. By the way, Raft-based systems have very low write throughput (check etcd and consul limitations if you are interested).

  2. Avoid global state at all. Weaken the guarantees to eventual consistency. Disable the update of files. If someone wants to edit the file, you need to save it as new file. Use the system which is organized as a peer-to-peer network. There is no peer in the cluster that keeps the full track of the system, so there is no single point of failure. This results in high write throughput and nice horizontal scalability.


So now let's discuss the options you've found:




Storing content as BLOBs in databases.




I don't think it's a good option to store files in traditional RDBMS because they provide optimizations for structured data and strong consistency, and you don't need neither of this. Also you'll have difficulties with backups and scaling. People usually don't use RDBMS in this way.




Using GridFS to chunk and store content.




I'm not sure, but it looks like GridFS is built on the top of MongoDB. Again, this is document-oriented database designed to store JSONs, not BLOBs. Also MongoDB had problems with a cluster for many years. MongoDB passed Jepsen tests only in 2017. This may mean that MongoDB cluster is not mature yet. Make performance and stress tests, if you go this way.




Storing content in a file server in directories using a hash and storing the metadata in a database.




This option means that you need to develop object storage on your own. Consider all the problems I've mentioned above.




Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.




I used neither of these solutions, but HDFS looks like overkill, because you get dependent on Hadoop stack. Have no idea about GlusterFS performance. Always consider the design of distributed file systems. If they have some kind of dedicated "metadata" serves, treat it as a single point of failure.



Finally, my thoughts on the solutions that may fit your needs:





  1. Elliptics. This object storage is not well-known outside of the russian part of the Internet, but it's mature and stable, and performance is perfect. It was developed at Yandex (russian search engine) and a lot of Yandex services (like Disk, Mail, Music, Picture hosting and so on) are built on the top of it. I used it in previous project, this may take some time for your ops to get into it, but it's worth it, if you're OK with GPL license.


  2. Ceph. This is real object storage. It's also open source, but it seems that only Red Hat people know how to deploy and maintain it. So get ready to a vendor lock. Also I heard that it have too complicated settings. Never used in production, so don't know about performance.


  3. Minio. This is S3-compatible object storage, under active development at the moment. Never used it in production, but it seems to be well-designed.


You may also check wiki page with the full list of available solutions.



And the last point: I strongly recommend not to use OpenStack Swift (there are lot of reasons why, but first of all, Python is just not good for these purposes).






share|improve this answer






























    up vote
    1
    down vote













    One probably-relevant question, whose answer I do not readily see in your post, is this:




    • How often do users actually "modify" the content?


    and:




    • When and if they do, how painful is it if a particular user is served "stale" content?




    Personally (and, "categorically speaking"), I prefer to tackle such problems in two stages: (1) identifying the objects to be stored – e.g. using a database as an index; and (2) actually storing them, this being a task that I wish to delegate to "a true file-system, which after all specializes in such things."



    A database (it "offhand" seems to me ...) would be a very good way to handle the logical ("as seen by the user") taxonomy of the things which you wish to store, while a distributed filesystem could handle the physical realities of storing the data and actually getting it to where it needs to go, and your application would be in the perfect position to gloss-over all of those messy filesystem details . . .






    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53061611%2fobject-storage-for-a-web-application%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote



      accepted










      I have been working on a similar system for last two years, the work is still in progress. However, requirements are slightly different from yours: modifications are not possible (I will try to explain why later), file sizes fall in range from several bytes to several megabytes, and, the most important one, the deduplication, which should be implemented both on the document and block levels. If two different users upload the same file to the storage, the only copy of the file should be kept. Also if two different files partially intersect with each other, it's necessary to store the only copy of the common part of these files.



      But let's focus on your requirements, so deduplication is not the case. First of all, high availability implies replication. You'll have to store your file in several replicas (typically 2 or 3, but there are techniques to decrease data parity) on independent machines in order to stay alive in case if one of the storage servers in your backend dies. Also, taking into account the estimation of the data amount, it's clear that all your data just won't fit into a single server, so vertical scaling is not possible and you have to consider partitioning. Finally, you need to take into account concurrency control to avoid race conditions when two different clients are trying to write or update the same data simultaneously. This topic is close to the concept of transactions (I don't mean ACID literally, but something close). So, to summarize, these facts mean that you're are actually looking for distributed database designed to store BLOBs.



      On of the biggest problems in distributed systems is difficulties with global state of the system. In brief, there are two approaches:




      1. Choose leader that will communicate with other peers and maintain global state of the distributed system. This approach provides strong consistency and linearizability guarantees. The main disadvantage is that in this case leader becomes the single point of failure. If leader dies, either some observer must assign leader role to one of the replicas (common case for master-slave replication in RDBMS world), or remaining peers need to elect new one (algorithms like Paxos and Raft are designed to target this issue). Anyway, almost whole incoming system traffic goes through the leader. This leads to the "hot spots" in backend: the situation when CPU and IO costs are unevenly distributed across the system. By the way, Raft-based systems have very low write throughput (check etcd and consul limitations if you are interested).

      2. Avoid global state at all. Weaken the guarantees to eventual consistency. Disable the update of files. If someone wants to edit the file, you need to save it as new file. Use the system which is organized as a peer-to-peer network. There is no peer in the cluster that keeps the full track of the system, so there is no single point of failure. This results in high write throughput and nice horizontal scalability.


      So now let's discuss the options you've found:




      Storing content as BLOBs in databases.




      I don't think it's a good option to store files in traditional RDBMS because they provide optimizations for structured data and strong consistency, and you don't need neither of this. Also you'll have difficulties with backups and scaling. People usually don't use RDBMS in this way.




      Using GridFS to chunk and store content.




      I'm not sure, but it looks like GridFS is built on the top of MongoDB. Again, this is document-oriented database designed to store JSONs, not BLOBs. Also MongoDB had problems with a cluster for many years. MongoDB passed Jepsen tests only in 2017. This may mean that MongoDB cluster is not mature yet. Make performance and stress tests, if you go this way.




      Storing content in a file server in directories using a hash and storing the metadata in a database.




      This option means that you need to develop object storage on your own. Consider all the problems I've mentioned above.




      Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.




      I used neither of these solutions, but HDFS looks like overkill, because you get dependent on Hadoop stack. Have no idea about GlusterFS performance. Always consider the design of distributed file systems. If they have some kind of dedicated "metadata" serves, treat it as a single point of failure.



      Finally, my thoughts on the solutions that may fit your needs:





      1. Elliptics. This object storage is not well-known outside of the russian part of the Internet, but it's mature and stable, and performance is perfect. It was developed at Yandex (russian search engine) and a lot of Yandex services (like Disk, Mail, Music, Picture hosting and so on) are built on the top of it. I used it in previous project, this may take some time for your ops to get into it, but it's worth it, if you're OK with GPL license.


      2. Ceph. This is real object storage. It's also open source, but it seems that only Red Hat people know how to deploy and maintain it. So get ready to a vendor lock. Also I heard that it have too complicated settings. Never used in production, so don't know about performance.


      3. Minio. This is S3-compatible object storage, under active development at the moment. Never used it in production, but it seems to be well-designed.


      You may also check wiki page with the full list of available solutions.



      And the last point: I strongly recommend not to use OpenStack Swift (there are lot of reasons why, but first of all, Python is just not good for these purposes).






      share|improve this answer



























        up vote
        2
        down vote



        accepted










        I have been working on a similar system for last two years, the work is still in progress. However, requirements are slightly different from yours: modifications are not possible (I will try to explain why later), file sizes fall in range from several bytes to several megabytes, and, the most important one, the deduplication, which should be implemented both on the document and block levels. If two different users upload the same file to the storage, the only copy of the file should be kept. Also if two different files partially intersect with each other, it's necessary to store the only copy of the common part of these files.



        But let's focus on your requirements, so deduplication is not the case. First of all, high availability implies replication. You'll have to store your file in several replicas (typically 2 or 3, but there are techniques to decrease data parity) on independent machines in order to stay alive in case if one of the storage servers in your backend dies. Also, taking into account the estimation of the data amount, it's clear that all your data just won't fit into a single server, so vertical scaling is not possible and you have to consider partitioning. Finally, you need to take into account concurrency control to avoid race conditions when two different clients are trying to write or update the same data simultaneously. This topic is close to the concept of transactions (I don't mean ACID literally, but something close). So, to summarize, these facts mean that you're are actually looking for distributed database designed to store BLOBs.



        On of the biggest problems in distributed systems is difficulties with global state of the system. In brief, there are two approaches:




        1. Choose leader that will communicate with other peers and maintain global state of the distributed system. This approach provides strong consistency and linearizability guarantees. The main disadvantage is that in this case leader becomes the single point of failure. If leader dies, either some observer must assign leader role to one of the replicas (common case for master-slave replication in RDBMS world), or remaining peers need to elect new one (algorithms like Paxos and Raft are designed to target this issue). Anyway, almost whole incoming system traffic goes through the leader. This leads to the "hot spots" in backend: the situation when CPU and IO costs are unevenly distributed across the system. By the way, Raft-based systems have very low write throughput (check etcd and consul limitations if you are interested).

        2. Avoid global state at all. Weaken the guarantees to eventual consistency. Disable the update of files. If someone wants to edit the file, you need to save it as new file. Use the system which is organized as a peer-to-peer network. There is no peer in the cluster that keeps the full track of the system, so there is no single point of failure. This results in high write throughput and nice horizontal scalability.


        So now let's discuss the options you've found:




        Storing content as BLOBs in databases.




        I don't think it's a good option to store files in traditional RDBMS because they provide optimizations for structured data and strong consistency, and you don't need neither of this. Also you'll have difficulties with backups and scaling. People usually don't use RDBMS in this way.




        Using GridFS to chunk and store content.




        I'm not sure, but it looks like GridFS is built on the top of MongoDB. Again, this is document-oriented database designed to store JSONs, not BLOBs. Also MongoDB had problems with a cluster for many years. MongoDB passed Jepsen tests only in 2017. This may mean that MongoDB cluster is not mature yet. Make performance and stress tests, if you go this way.




        Storing content in a file server in directories using a hash and storing the metadata in a database.




        This option means that you need to develop object storage on your own. Consider all the problems I've mentioned above.




        Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.




        I used neither of these solutions, but HDFS looks like overkill, because you get dependent on Hadoop stack. Have no idea about GlusterFS performance. Always consider the design of distributed file systems. If they have some kind of dedicated "metadata" serves, treat it as a single point of failure.



        Finally, my thoughts on the solutions that may fit your needs:





        1. Elliptics. This object storage is not well-known outside of the russian part of the Internet, but it's mature and stable, and performance is perfect. It was developed at Yandex (russian search engine) and a lot of Yandex services (like Disk, Mail, Music, Picture hosting and so on) are built on the top of it. I used it in previous project, this may take some time for your ops to get into it, but it's worth it, if you're OK with GPL license.


        2. Ceph. This is real object storage. It's also open source, but it seems that only Red Hat people know how to deploy and maintain it. So get ready to a vendor lock. Also I heard that it have too complicated settings. Never used in production, so don't know about performance.


        3. Minio. This is S3-compatible object storage, under active development at the moment. Never used it in production, but it seems to be well-designed.


        You may also check wiki page with the full list of available solutions.



        And the last point: I strongly recommend not to use OpenStack Swift (there are lot of reasons why, but first of all, Python is just not good for these purposes).






        share|improve this answer

























          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          I have been working on a similar system for last two years, the work is still in progress. However, requirements are slightly different from yours: modifications are not possible (I will try to explain why later), file sizes fall in range from several bytes to several megabytes, and, the most important one, the deduplication, which should be implemented both on the document and block levels. If two different users upload the same file to the storage, the only copy of the file should be kept. Also if two different files partially intersect with each other, it's necessary to store the only copy of the common part of these files.



          But let's focus on your requirements, so deduplication is not the case. First of all, high availability implies replication. You'll have to store your file in several replicas (typically 2 or 3, but there are techniques to decrease data parity) on independent machines in order to stay alive in case if one of the storage servers in your backend dies. Also, taking into account the estimation of the data amount, it's clear that all your data just won't fit into a single server, so vertical scaling is not possible and you have to consider partitioning. Finally, you need to take into account concurrency control to avoid race conditions when two different clients are trying to write or update the same data simultaneously. This topic is close to the concept of transactions (I don't mean ACID literally, but something close). So, to summarize, these facts mean that you're are actually looking for distributed database designed to store BLOBs.



          On of the biggest problems in distributed systems is difficulties with global state of the system. In brief, there are two approaches:




          1. Choose leader that will communicate with other peers and maintain global state of the distributed system. This approach provides strong consistency and linearizability guarantees. The main disadvantage is that in this case leader becomes the single point of failure. If leader dies, either some observer must assign leader role to one of the replicas (common case for master-slave replication in RDBMS world), or remaining peers need to elect new one (algorithms like Paxos and Raft are designed to target this issue). Anyway, almost whole incoming system traffic goes through the leader. This leads to the "hot spots" in backend: the situation when CPU and IO costs are unevenly distributed across the system. By the way, Raft-based systems have very low write throughput (check etcd and consul limitations if you are interested).

          2. Avoid global state at all. Weaken the guarantees to eventual consistency. Disable the update of files. If someone wants to edit the file, you need to save it as new file. Use the system which is organized as a peer-to-peer network. There is no peer in the cluster that keeps the full track of the system, so there is no single point of failure. This results in high write throughput and nice horizontal scalability.


          So now let's discuss the options you've found:




          Storing content as BLOBs in databases.




          I don't think it's a good option to store files in traditional RDBMS because they provide optimizations for structured data and strong consistency, and you don't need neither of this. Also you'll have difficulties with backups and scaling. People usually don't use RDBMS in this way.




          Using GridFS to chunk and store content.




          I'm not sure, but it looks like GridFS is built on the top of MongoDB. Again, this is document-oriented database designed to store JSONs, not BLOBs. Also MongoDB had problems with a cluster for many years. MongoDB passed Jepsen tests only in 2017. This may mean that MongoDB cluster is not mature yet. Make performance and stress tests, if you go this way.




          Storing content in a file server in directories using a hash and storing the metadata in a database.




          This option means that you need to develop object storage on your own. Consider all the problems I've mentioned above.




          Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.




          I used neither of these solutions, but HDFS looks like overkill, because you get dependent on Hadoop stack. Have no idea about GlusterFS performance. Always consider the design of distributed file systems. If they have some kind of dedicated "metadata" serves, treat it as a single point of failure.



          Finally, my thoughts on the solutions that may fit your needs:





          1. Elliptics. This object storage is not well-known outside of the russian part of the Internet, but it's mature and stable, and performance is perfect. It was developed at Yandex (russian search engine) and a lot of Yandex services (like Disk, Mail, Music, Picture hosting and so on) are built on the top of it. I used it in previous project, this may take some time for your ops to get into it, but it's worth it, if you're OK with GPL license.


          2. Ceph. This is real object storage. It's also open source, but it seems that only Red Hat people know how to deploy and maintain it. So get ready to a vendor lock. Also I heard that it have too complicated settings. Never used in production, so don't know about performance.


          3. Minio. This is S3-compatible object storage, under active development at the moment. Never used it in production, but it seems to be well-designed.


          You may also check wiki page with the full list of available solutions.



          And the last point: I strongly recommend not to use OpenStack Swift (there are lot of reasons why, but first of all, Python is just not good for these purposes).






          share|improve this answer














          I have been working on a similar system for last two years, the work is still in progress. However, requirements are slightly different from yours: modifications are not possible (I will try to explain why later), file sizes fall in range from several bytes to several megabytes, and, the most important one, the deduplication, which should be implemented both on the document and block levels. If two different users upload the same file to the storage, the only copy of the file should be kept. Also if two different files partially intersect with each other, it's necessary to store the only copy of the common part of these files.



          But let's focus on your requirements, so deduplication is not the case. First of all, high availability implies replication. You'll have to store your file in several replicas (typically 2 or 3, but there are techniques to decrease data parity) on independent machines in order to stay alive in case if one of the storage servers in your backend dies. Also, taking into account the estimation of the data amount, it's clear that all your data just won't fit into a single server, so vertical scaling is not possible and you have to consider partitioning. Finally, you need to take into account concurrency control to avoid race conditions when two different clients are trying to write or update the same data simultaneously. This topic is close to the concept of transactions (I don't mean ACID literally, but something close). So, to summarize, these facts mean that you're are actually looking for distributed database designed to store BLOBs.



          On of the biggest problems in distributed systems is difficulties with global state of the system. In brief, there are two approaches:




          1. Choose leader that will communicate with other peers and maintain global state of the distributed system. This approach provides strong consistency and linearizability guarantees. The main disadvantage is that in this case leader becomes the single point of failure. If leader dies, either some observer must assign leader role to one of the replicas (common case for master-slave replication in RDBMS world), or remaining peers need to elect new one (algorithms like Paxos and Raft are designed to target this issue). Anyway, almost whole incoming system traffic goes through the leader. This leads to the "hot spots" in backend: the situation when CPU and IO costs are unevenly distributed across the system. By the way, Raft-based systems have very low write throughput (check etcd and consul limitations if you are interested).

          2. Avoid global state at all. Weaken the guarantees to eventual consistency. Disable the update of files. If someone wants to edit the file, you need to save it as new file. Use the system which is organized as a peer-to-peer network. There is no peer in the cluster that keeps the full track of the system, so there is no single point of failure. This results in high write throughput and nice horizontal scalability.


          So now let's discuss the options you've found:




          Storing content as BLOBs in databases.




          I don't think it's a good option to store files in traditional RDBMS because they provide optimizations for structured data and strong consistency, and you don't need neither of this. Also you'll have difficulties with backups and scaling. People usually don't use RDBMS in this way.




          Using GridFS to chunk and store content.




          I'm not sure, but it looks like GridFS is built on the top of MongoDB. Again, this is document-oriented database designed to store JSONs, not BLOBs. Also MongoDB had problems with a cluster for many years. MongoDB passed Jepsen tests only in 2017. This may mean that MongoDB cluster is not mature yet. Make performance and stress tests, if you go this way.




          Storing content in a file server in directories using a hash and storing the metadata in a database.




          This option means that you need to develop object storage on your own. Consider all the problems I've mentioned above.




          Using a distributed file system such as GlusterFS or HDFS and storing the file metadata in a database.




          I used neither of these solutions, but HDFS looks like overkill, because you get dependent on Hadoop stack. Have no idea about GlusterFS performance. Always consider the design of distributed file systems. If they have some kind of dedicated "metadata" serves, treat it as a single point of failure.



          Finally, my thoughts on the solutions that may fit your needs:





          1. Elliptics. This object storage is not well-known outside of the russian part of the Internet, but it's mature and stable, and performance is perfect. It was developed at Yandex (russian search engine) and a lot of Yandex services (like Disk, Mail, Music, Picture hosting and so on) are built on the top of it. I used it in previous project, this may take some time for your ops to get into it, but it's worth it, if you're OK with GPL license.


          2. Ceph. This is real object storage. It's also open source, but it seems that only Red Hat people know how to deploy and maintain it. So get ready to a vendor lock. Also I heard that it have too complicated settings. Never used in production, so don't know about performance.


          3. Minio. This is S3-compatible object storage, under active development at the moment. Never used it in production, but it seems to be well-designed.


          You may also check wiki page with the full list of available solutions.



          And the last point: I strongly recommend not to use OpenStack Swift (there are lot of reasons why, but first of all, Python is just not good for these purposes).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 5 at 19:50

























          answered Nov 5 at 17:53









          Vitaly Isaev

          2,21222044




          2,21222044
























              up vote
              1
              down vote













              One probably-relevant question, whose answer I do not readily see in your post, is this:




              • How often do users actually "modify" the content?


              and:




              • When and if they do, how painful is it if a particular user is served "stale" content?




              Personally (and, "categorically speaking"), I prefer to tackle such problems in two stages: (1) identifying the objects to be stored – e.g. using a database as an index; and (2) actually storing them, this being a task that I wish to delegate to "a true file-system, which after all specializes in such things."



              A database (it "offhand" seems to me ...) would be a very good way to handle the logical ("as seen by the user") taxonomy of the things which you wish to store, while a distributed filesystem could handle the physical realities of storing the data and actually getting it to where it needs to go, and your application would be in the perfect position to gloss-over all of those messy filesystem details . . .






              share|improve this answer

























                up vote
                1
                down vote













                One probably-relevant question, whose answer I do not readily see in your post, is this:




                • How often do users actually "modify" the content?


                and:




                • When and if they do, how painful is it if a particular user is served "stale" content?




                Personally (and, "categorically speaking"), I prefer to tackle such problems in two stages: (1) identifying the objects to be stored – e.g. using a database as an index; and (2) actually storing them, this being a task that I wish to delegate to "a true file-system, which after all specializes in such things."



                A database (it "offhand" seems to me ...) would be a very good way to handle the logical ("as seen by the user") taxonomy of the things which you wish to store, while a distributed filesystem could handle the physical realities of storing the data and actually getting it to where it needs to go, and your application would be in the perfect position to gloss-over all of those messy filesystem details . . .






                share|improve this answer























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  One probably-relevant question, whose answer I do not readily see in your post, is this:




                  • How often do users actually "modify" the content?


                  and:




                  • When and if they do, how painful is it if a particular user is served "stale" content?




                  Personally (and, "categorically speaking"), I prefer to tackle such problems in two stages: (1) identifying the objects to be stored – e.g. using a database as an index; and (2) actually storing them, this being a task that I wish to delegate to "a true file-system, which after all specializes in such things."



                  A database (it "offhand" seems to me ...) would be a very good way to handle the logical ("as seen by the user") taxonomy of the things which you wish to store, while a distributed filesystem could handle the physical realities of storing the data and actually getting it to where it needs to go, and your application would be in the perfect position to gloss-over all of those messy filesystem details . . .






                  share|improve this answer












                  One probably-relevant question, whose answer I do not readily see in your post, is this:




                  • How often do users actually "modify" the content?


                  and:




                  • When and if they do, how painful is it if a particular user is served "stale" content?




                  Personally (and, "categorically speaking"), I prefer to tackle such problems in two stages: (1) identifying the objects to be stored – e.g. using a database as an index; and (2) actually storing them, this being a task that I wish to delegate to "a true file-system, which after all specializes in such things."



                  A database (it "offhand" seems to me ...) would be a very good way to handle the logical ("as seen by the user") taxonomy of the things which you wish to store, while a distributed filesystem could handle the physical realities of storing the data and actually getting it to where it needs to go, and your application would be in the perfect position to gloss-over all of those messy filesystem details . . .







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 5 at 19:29









                  Mike Robinson

                  3,89421021




                  3,89421021






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53061611%2fobject-storage-for-a-web-application%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Florida Star v. B. J. F.

                      Danny Elfman

                      Lugert, Oklahoma