论文部分内容阅读
随着互联网上信息量的爆炸式增长,海量网页数据的存储出现了难题。针对海量网页数据进行存储的问题,传统的集中式存储和管理方案已经难以提供高效、可靠和稳定的服务。本文设计并实现了一种针对海量网页数据进行存储的分布式平台模型。该模型利用Hadoop集群和基于HDFS分布式文件系统的Hbase数据库实现高效率地分析、计算和存储海量数据,以MapReduce计算模型和Zookeeper同步协同系统保持数据写入的高效性和一致性。最后通过实验测试,该存储模型可以克服传统的存储模型存储时存在的读写效率低、数据写入不一致的问题,同时具有良好的扩展性、可行性、稳定性和可靠性。
With the explosive growth of the amount of information on the Internet, massive webpage data storage presents problems. In response to the problem of storing huge amounts of webpage data, traditional centralized storage and management solutions have been difficult to provide efficient, reliable and stable services. This paper designs and implements a distributed platform model for mass web data storage. The model uses Hadoop cluster and Hbase database based on HDFS distributed file system to efficiently analyze, calculate and store huge amounts of data and maintain the efficiency and consistency of data writing with MapReduce computing model and Zookeeper synchronization and collaboration system. Finally, through the experimental test, the storage model can overcome the problems of low read and write efficiency and inconsistent data writing when storing the traditional storage model, and has good scalability, feasibility, stability and reliability.