Journal Indexing & Metrics

Total Downloads: 2
Total Views: 398
Content List:
Authors Affiliation Abstract Keywords References
Cite
Share

MULTI-JOIN-ORDERING QUERY OPTIMIZATION ALGORITHM FOR HIVE WAREHOUSE WITH MAPREDUCE

Ms. Nisha Jain, Dr. Preeti Tiwari

First Published December 26,2021

Authors
  1. Ms. Nisha Jain
  2. Dr. Preeti Tiwari
Affiliation
  • Research Scholar, Department of Computer Science, RTU, Kota
  • Associate Professor, Department of Computer Science, ISIM, Jaipur
Abstract
According to the Digital Report of July, 2021, Billions of users around the world uses Mobile Phones,
Internet, social media every second. This huge range of heterogeneous digital data is called Big
Data, and is measured in terms of terabytes or petabytes. It is difficult to the conventional relational
databases to handle these heterogeneous data for data analytics, but is still in use significantly in the
growth of Big Data. To handle SQL-based structured queries, Hadoop is one of the prominent and
well-suited solution that allows Big Data to be stored and processed. Hive support SQL queries on
Hadoop. Hive warehouse, is the oldest SQL-engine on the top of the Hadoop framework and to store
the processed data, it uses HDFS (Hadoop Distributed File System). On the Hadoop, MapReduce is
an execution engine that executes SQL-based queries. In the Query Optimization, join ordering
always plays a significant role because when the order of tables in joining operation is changed,
execution time of the query is reduced to a greater extent. The main problem of the Hive is that it does
not enhance the order of the join for an SQL-query and also does not give assurance for an optimal
execution plan. Its time complexity is measured in exponential (Shan, Y., & Chen, Y., 2015).The main
focus of this paper is to discover the finest join ordering solution for a Hive query optimization problem
through appropriate search algorithms and to improve SQL-based Hive queries performance with
MapReduce–based system.
Keywords

Big Data, Hadoop, Hive, HDFS, MapReduce, Query Optimization Technique

References
  1. Chandar, J. (2010). Join algorithms using map/reduce. Magisterarb. University of Edinburgh.
  2. Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Zhang, N., & Murthy, R. (2010). Hivea petabyte scale data warehouse using hadoop. In 2010 IEEE 26th international conference on data engineering (ICDE 2010) (pp. 996-1005). IEEE.
  3. Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008). Pig Latin: A not-soforeign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 1099-1110).
  4. Shaikh, A., & Jindal, R. (2012). Join query processing in mapreduce environment. InInternational Conference on Advances in Communication, Network, and Computing (pp. 275-281). Springer, Berlin, Heidelberg.
  5. Zhang, X., Chen, L., & Wang, M. (2012). Efficient multi-way theta-join processing using mapreduce. arXiv preprint arXiv:1208.0081.
  6. Okcan, A., & Riedewald, M. (2011). Processing theta-joins using mapreduce. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (pp. 949-960).
  7. Afrati, F. N., & Ullman, J. D. (2011). Optimizing multiway joins in a map-reduce environment. IEEE Transactions on Knowledge and Data Engineering, 23(9), 1282-1298.
  8. Wu, S., Li, F., Mehrotra, S., & Ooi, B. C. (2011). Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing (pp. 1-13).
  9. Ganguly, S., Hasan, W., & Krishnamurthy, R. (1992). Query optimization for parallel execution. In Proceedings of the 1992 ACM SIGMOD international conference on management of data (pp. 9-18).
  10. Chen, M. S., Yu, P. S., & Wu, K. L. (1992). Scheduling and processor allocation for parallel execution of multijoin queries. In [1992] Eighth International Conference on Data Engineering (pp. 58-67). IEEE.
  11. Kadkhodaei, H., & Mahmoudi, F. (2011). A combination method for join ordering problem in relational databases using genetic algorithm and ant colony. In 2011 IEEE International Conference on Granular Computing (pp. 312-317). IEEE.
  12. Chande, S. V., & Sinha, M. (2011). Genetic optimization for the join ordering problem of database queries. In 2011 Annual IEEE India Conference (pp. 1-5). IEEE.
  13. Bagui, S., &Devulapalli, K. (2018). Comparison of Hive
  14. Pal, S. (2016). SQL on Big Data: Technology, Architecture, and Innovation. Apress.
  15. Chen, Y., Qin, X., Bian, H., & Chen, J. (2014). A Study of SQL-on-Hadoop Systems. Big Data Benchmarks, Performance Optimization, and Emerging Hardware-4th and 5th Workshops, pp. 154-166BPOE 2014, Salt Lake City, USA, LNCS, Vol. 8807.
  16. Vissapragada, B. (2014). Optimizing SQL Query Execution over MapReduce (Doctoral dissertation, International Institute of Information Technology Hyderabad.
  17. Shan, Y., & Chen, Y. (2015). Scalable Query Optimization for Efficient Data Processing using MapReduce. In 2015 IEEE International Congress on Big Data (pp. 649-652). IEEE.
Article Menu
Total Downloads: 2
Total Views: 786
Cite
Share
1