Movie Recommendations

Ever watched a movie on Netflix and then seen a list of recommendations pop up, urging you to watch similar movies? In this assignment, I used the MapReduce framework to recommend similar movies based on the ratings they were given. I used a series of Mappers and Reducers to find similarity metrics for pairs of movies given a large dataset containing movies and user ratings. In essence, I found all the people who rated two movies, made vectors based on ratings for each movies, calculated the similarity metric between the two vectors, and then returned results above a certain threshold.

I used two different datasets, one that had 190,000 ratings and the other that had over a million ratings. On my smaller dataset, I was able to run the map reduce framework locally. For the larger one, I used Amazon's Elastic MapReduce to do my computations and save them on the cloud. Each of these sets had two data files: one for movies (which contained movie id and movie title) and one for ratings (which contained movie id, user id, and rating).

I had five steps of mapping and reducing to go from my initial set up to my final output.

  • In my 0th mapper and reducer, I parsed through each line of the data files and output the movie_id and the additional info (i.e. the title or the user id and rating). In my reducing step, I emitted the movie title along with the movie rating. Here, in essence, I was performing a join on the two based on movie id.
  • My 1st mapper grouped the list of values by movie rating. This mapper was implicitly implemented by the framework. My reducer took in as inputs a key of movie title and a list of values of all the user ids and ratings. I found the number of people who rated each movie and I output the user id as a key with the value of movie title, rating, and number of raters.
  • My 2nd mapper implicitly grouped by user id. In my reducer, I then used the inputs described above to find all pairs of movies the user had rated. I emitted each pair along with the users rating for both movies and the number of users that had rated both of them.
  • My third mapper implicitly grouped by movie pairs. My reducer took in the movie pair and the ratings. I was able to then form vectors of the ratings for each movie. It was in this step that I computed several similarity metrics described below. For the output of my reducer, I emitted as a key the first movie and as values the second movie and metrics.
  • My fourth mapper implicitly grouped by movie title. My reducer sorted the values based on the regularized correlation value and then returned each movie and its corresponding similar partner only when it was above a certain threshold. My final ouput (for this reducer and the overall framework) was a key of movie title 1 and movie title 2 and values of correlation value, regularized correlation value, cosine similarity value, jaccard similarity value, number of users for both movies, number of users for one movie, number of users for the second movie.

Overall, using the similarity metrics. I was able to clump similar movies together across both datasets. As a reader, we can tell that there were several good matches. For example, in my large dataset. I matched movies like "Free Willy 2: The Adventure Home (1995)" with (the obvious) "Free Willy 3: The Rescue (1997)" and "Home Alone 2: Lost in New York (1992)" movies. It did great on children's movies pairing "Bambi" with movies like "Snow White","Pinocchio", and "Cinderella". It also picked up on series of movies like clumping "Dr. No" with "Tomorrow Never Dies" and other Bond movies.

Similarly, I saw a good performance with on my small dataset. On a more modern note, I saw movies like "127 hours" paired with "Life of Pi", two films that are so similar that even IMDB recommends them. For action movies, it recommended "White House Down" for "A Good Day to Die Hard." It's easy to see that my recommendations picked up on subgenres like superhero films. It recommended both "The Amazing Spider Man" and "Man of Steel" for "Captain America: The First Avenger." Overall, for both datasets, as I read through the similar pairs the movies made much sense that they would be paired together.

Similarity Metrics

I used four different similarity metrics. All of sample data below is of the form movie [movie_title1, movie_title2] and [correlation value, regularized correlation value, cosine similarity value, jaccard similarity value, n, n1, n2] where n is the number of users who rated both movies, n1 the number who rated movie 1, and n2 the number who rated movie 2. As mentioned above, I only returned results that had a regularized correlation value above a threshold of .5.

The correlation value relies on using vectors of user ratings for movies A and B. Correlation is measuring how dependent these two vectors are(more than just mere chance) by evaluating the covariation of the two vectors and dividing by their standard deviations. It is relying on this vector of matches to determine similarity. Another way of stating that is:

mapreduce code for movie review dataset

Below are some examples of interesting movies that had high correlation values (the first number reported)

Likewise, the regularized correlation value also looks at the similarity between these two vectors of ratings, but it is a bit stronger in that it considers that some pairs would have very few raters in common. If we don't add noise, we can get high correlation values failing to account for the fact that it's just low numbers of ratings. It is interesting to note that from above certain movies (like Bicycle Thief, Going My Way) had high correlation but lower regularized correlation for the aforementioned reasons. In calculating, regularized correlation, I used a prior correlation value of zero, which gives me the following equation:

mapreduce code for movie review dataset

Below are some examples of interesting movies that had high regularized correlation values (the second number reported).

I also used cosine similarity to evaluate similarity. Cosine similarity is most easily thought about by visualizing the two vectors. This metric is taking the cosine of the angle between the two ratings vector (if they are exactly same have rating of 1, and if are 'perpendicular' have rating of 0)-it is bounded between 0 and 1 since we are working in the positive space. It is similar to the above metrics in that it is evaluating the difference between two vectors, but it is different in that it does so by mapping the vectors to space and using distance (not statistics like covariance and standard deviation) to measure the difference.

mapreduce code for movie review dataset

Below are some examples of interesting movies that had high regularized correlation values (the third number reported).

Lastly, I used the Jaccard Similarity to measure movie pairs. In contrast to the previous metrics, it departs completely from the vector rating model. Instead, it simply looks at the number of people who rated both movies divided by the sum of the number of people who rated each movie. Essentially, this is saying that the mere fact that someone rated two movies makes them similar, regardless of their value. The strength of the similarity is based on the proportion of people who ranked that movie out of the total.

mapreduce code for movie review dataset

Below are some examples of interesting movies that had high regularized correlation values (the fourth number reported). Something interesting to note is that in general, the Jaccard similarity was much lower than the others. I've included obvious pairs (ones that most people would say are similar) below to highlight this difference.

Book cover

International Symposium on Innovative and Interdisciplinary Applications of Advanced Technologies

IAT 2023: Advanced Technologies, Systems, and Applications VIII pp 329–340 Cite as

A Recommendation System for Movies by Using Hadoop Mapreduce

  • Dinko Omeragić   ORCID: orcid.org/0000-0002-7063-2666 12 ,
  • Aldin Beriša   ORCID: orcid.org/0000-0002-8235-6689 12 ,
  • Dino Kečo   ORCID: orcid.org/0000-0002-1583-242X 13 ,
  • Samed Jukić   ORCID: orcid.org/0000-0001-7931-4093 12 &
  • Bećir Isaković   ORCID: orcid.org/0000-0002-6085-4548 13  
  • Conference paper
  • First Online: 01 September 2023

207 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 644))

Recommendation systems have become an integral component of the sales strategies of many businesses. Due to the immense size of data sets, however, innovative algorithms such as collaborative filtering, clustering models, and search-based methods are utilized. This study intends to demonstrate the benefits of the Hadoop MapReduce framework and item-to-item collaborative filtering by developing a user-ratings-based recommendation system for a larger movie data set. The resulting system offers information on movies filtered by year, director name, or comparable movies based on user reviews. Thus, we have been able to deliver credible movie suggestions based on these lists. The evaluation indicates that the recommended approaches are accurate and reliable.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Deldjoo, Y., Schedl, M., Cremonesi, P., Pasi, G.: Recommender systems leveraging multimedia content. ACM Comput. Surv. (CSUR) 53 (5), 1–38 (2020)

Article   Google Scholar  

Konstan, J., Terveen, L.: Human-centered recommender systems: origins, advances, challenges, and opportunities. AI Mag. 42 (3), 31–42 (2021)

Google Scholar  

Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186 (1994)

Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. (CSUR) 52 (1), 1–38 (2019)

Cui, B.B.: Design and implementation of movie recommendation system based on KNN collaborative filtering algorithm. In: ITM Web of Conferences, vol. 12, p. 04008. EDP Sciences (2017)

Srifi, M., Oussous, A., Ait Lahcen, A., Mouline, S.: Recommender systems based on collaborative filtering using review texts-a survey. Information 11 (6), 317 (2020)

Valdiviezo-Diaz, P., Ortega, F., Cobos, E., Lara-Cabrera, R.: A collaborative filtering approach based on Naïve Bayes classifier. IEEE Access 7 , 108581–108592 (2019)

Developers, G.: Collaborative filtering advantages and disadvantages. https://developers.google.com/machine-learning/recommendation/collaborative/summary . Accessed 29 Jan 2023

Fatourechi, M.: The evolving landscape of recommendation systems. https://techcrunch.com/2015/09/28/the-evolving-landscape-of-recommendation-systems . Accessed 29 Jan 2023

Topaloglu, O., Dass, M.: The impact of online review content and linguistic style matching on new product sales: the moderating role of review helpfulness. Decis. Sci. 52 (3), 749–775 (2021)

Yi, H.T., Yeo, C., Amenuvor, F.E., Boateng, H.: Examining the relationship between customer bonding, customer participation, and customer satisfaction. J. Retail. Consum. Serv. 62 , 102598 (2021)

Afoudi, Y., Lazaar, M., Al Achhab, M.: Impact of feature selection on content-based recommendation system. In: 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), pp. 1–6. IEEE (2019)

Reddy, S.R.S., Nalluri, S., Kunisetti, S., Ashok, S., Venkatesh, B.: Content-based movie recommendation system using genre correlation. In: Satapathy, S.C., Bhateja, V., Das, S. (eds.) Smart Intelligent Computing and Applications. SIST, vol. 105, pp. 391–397. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1927-3_42

Chapter   Google Scholar  

Lee, H., Im, J., Jang, S., Cho, H., Chung, S.: Melu: meta-learned user preference estimator for cold-start recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1073–1082 (2019)

Herce-Zelaya, J., Porcel, C., Bernabé-Moreno, J., Tejeda-Lorente, A., Herrera-Viedma, E.: New technique to alleviate the cold start problem in recommender systems using information from social media and random decision forests. Inf. Sci. 536 , 156–170 (2020)

Article   MathSciNet   Google Scholar  

Natarajan, S., Vairavasundaram, S., Natarajan, S., Gandomi, A.H.: Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data. Expert Syst. Appl. 149 , 113248 (2020)

Esmaeili, L., Mardani, S., Golpayegani, S.A.H., Madar, Z.Z.: A novel tourism recommender system in the context of social commerce. Expert Syst. Appl. 149 , 113301 (2020)

Ahuja, R., Solanki, A., Nayyar, A.: Movie recommender system using k-means clustering and k-nearest neighbor. In: 2019 9th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp. 263–268. IEEE (2019)

Kashef, R.: Enhancing the role of large-scale recommendation systems in the IoT context. IEEE Access 8 , 178248–178257 (2020)

Acharya, S.S., Nupur, N., Sahoo, P., Baidya, P.: Mood-based movie recommendation system. In: Dehuri, S., Prasad Mishra, B.S., Mallick, P.K., Cho, SB. (eds.) Biologically Inspired Techniques in Many Criteria Decision Making. Smart Innovation, Systems and Technologies, vol. 271, pp. 151–158. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-8739-6_13

Pariserum Perumal, S., Sannasi, G., Arputharaj, K.: An intelligent fuzzy rule-based e-learning recommendation system for dynamic user interests. J. Supercomput. 75 (8), 5145–5160 (2019). https://doi.org/10.1007/s11227-019-02791-z

Lops, P., Jannach, D., Musto, C., Bogers, T., Koolen, M.: Trends in content-based recommendation: preface to the special issue on recommender systems based on rich item descriptions. User Model. User-Adap. Inter. 29 , 239–249 (2019)

Jannach, D., Manzoor, A., Cai, W., Chen, L.: A survey on conversational recommender systems. ACM Comput. Surv. (CSUR) 54 (5), 1–36 (2021)

Subramaniyaswamy, V., et al.: An ontology-driven personalized food recommendation in IoT-based healthcare system. J. Supercomput. 75 , 3184–3216 (2019)

Mubi svod platform database for movie lovers dataset. https://www.kaggle.com/clementmsika/mubi-sqlite-database-for-movie-lovers . Accessed 29 Jan 2023

Sewal, P., Singh, H.: A critical analysis of apache Hadoop and spark for big data processing. In: 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), pp. 308–313. IEEE (2021)

Kalia, K., Gupta, N.: Analysis of Hadoop MapReduce scheduling in heterogeneous environment. Ain Shams Eng. J. 12 (1), 1101–1110 (2021)

Dhage, S.P., Subhash, T.R., Kotkar, R.V., Varpe, P.D., Pardeshi, S.S.: An overview-google file system (GFS) and Hadoop distributed file system (HDFs). SAMRIDDHI J. Phys. Sci. Eng. Technol. 12 (SUP 1), 126–128 (2020)

Linden, G., Smith, B., York, J.: Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7 (1), 76–80 (2003)

Linden, G.D., Jacobi, J.A., Benson, E.A.: Collaborative recommendations using item-to-item similarity mappings (24 July 2001), uS Patent 6,266,649

Cosine similarity. https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/cosine/ . Accessed 29 Jan 2023

Grootendorst, M.: Distance measures in data science. https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa . Accessed 29 Jan 2023

Liu, D., Chen, X., Peng, D.: Some cosine similarity measures and distance measures between q-rung orthopair fuzzy sets. Int. J. Intell. Syst. 34 (7), 1572–1587 (2019)

Download references

Author information

Authors and affiliations.

International Burch University, Sarajevo, Bosnia and Herzegovina

Dinko Omeragić, Aldin Beriša & Samed Jukić

University of Sarajevo, Sarajevo, Bosnia and Herzegovina

Dino Kečo & Bećir Isaković

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Dinko Omeragić .

Editor information

Editors and affiliations.

University of Sarajevo-Faculty of Civil Engineering, Sarajevo, Bosnia and Herzegovina

Naida Ademović

International Burch University, Francuske revolucije bb, Ilidža, Bosnia and Herzegovina

Jasmin Kevrić

University of Utah, Salt Lake City, UT, USA

Zlatan Akšamija

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Omeragić, D., Beriša, A., Kečo, D., Jukić, S., Isaković, B. (2023). A Recommendation System for Movies by Using Hadoop Mapreduce. In: Ademović, N., Kevrić, J., Akšamija, Z. (eds) Advanced Technologies, Systems, and Applications VIII. IAT 2023. Lecture Notes in Networks and Systems, vol 644. Springer, Cham. https://doi.org/10.1007/978-3-031-43056-5_24

Download citation

DOI : https://doi.org/10.1007/978-3-031-43056-5_24

Published : 01 September 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-43055-8

Online ISBN : 978-3-031-43056-5

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Trending Now
  • Foundational Courses
  • Data Science
  • Practice Problem
  • Machine Learning
  • System Design
  • DevOps Tutorial
  • Hadoop - Reducer in Map-Reduce
  • MapReduce - Combiners
  • How to Execute Character Count Program in MapReduce Hadoop?
  • MapReduce Architecture
  • Hadoop - Mapper In MapReduce
  • MapReduce Program - Finding The Average Age of Male and Female Died in Titanic Disaster
  • Hadoop MapReduce - Data Flow
  • Difference Between MapReduce and Hive
  • MapReduce Program - Weather Data Analysis For Analyzing Hot And Cold Days
  • Map Reduce and its Phases with numerical example.
  • Import and Export Data using SQOOP
  • How MapReduce handles data query ?
  • HIVE Overview
  • HDFS - Data Read Operation
  • RDMS vs Hadoop
  • What is Schema On Read and Schema On Write in Hadoop?
  • Hadoop Version 3.0 - What's New?
  • MapReduce Job Execution
  • How Does Namenode Handles Datanode Failure in Hadoop Distributed File System?

How to find top-N records using MapReduce

Finding top 10 or 20 records from a large dataset is the heart of many recommendation systems and it is also an important attribute for data analysis. Here, we will discuss the two methods to find top-N records as follows.

Method 1: First, let’s find out top-10 most viewed movies to understand the methods and then we will generalize it for ‘n’ records.

Data format:  

Approach Used: Using TreeMap. Here, the idea is to use Mappers to find local top 10 records, as there can be many Mappers running parallelly on different blocks of data of a file. And then all these local top 10 records will be aggregated at Reducer where we find top 10 global records for the file.

Notice: This approach only work if we assume that 2 movie can  not have the same number of views. Otherwise, only one of those two movie will be returned.

Example: Assume that file(30 TB) is divided into 3 blocks of 10 TB each and each block is processed by a Mapper parallelly so we find top 10 records (local) for that block. Then this data moves to the reducer where we find the actual top 10 records from the file movie.txt . 

Movie.txt file: You can see the whole file by click here  

mapreduce code for movie review dataset

Mapper code:  

Explanation:  

The important point to note here is that we use “ context.write() ” in cleanup() method which runs only once at the end in the lifetime of Mapper. Mapper processes one key-value pair at a time and writes them as intermediate output on local disk. But we have to process whole block (all key-value pairs) to find top10, before writing the output, hence we use context.write() in cleanup().

Reducer code:  

Explanation: Same logic as mapper. Reducer processes one key-value pair at a time and writes them as final output on HDFS. But we have to process all key-value pairs to find top10, before writing the output, hence we use cleanup() .

Driver Code:  

Running the jar file:  

  • We export all the classes as jar files.
  • We move our file movie.txt from local file system to /geeksInput in HDFS. 
  • We now run the yarn services to run the jar file. 

mapreduce code for movie review dataset

  • We make our custom parameter using set() method. 
  • This value can be accessed in any Mapper/Reducer by using get() method  

Please Login to comment...

Similar reads.

  • Computer Subject
  • 5 Reasons to Start Using Claude 3 Instead of ChatGPT
  • 6 Ways to Identify Who an Unknown Caller
  • 10 Best Lavender AI Alternatives and Competitors 2024
  • The 7 Best AI Tools for Programmers to Streamline Development in 2024
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Subscribe to the PwC Newsletter

Join the community, edit dataset, edit dataset tasks.

Some tasks are inferred based on the benchmarks list.

Add a Data Loader

Remove a data loader.

  • huggingface/datasets -
  • tensorflow/datasets -
  • pytorch/text -

Edit Dataset Modalities

Edit dataset languages, edit dataset variants.

The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset.

Add a new evaluation result row

Imdb movie reviews.

mapreduce code for movie review dataset

The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.

Benchmarks Edit Add a new result Link an existing benchmark

Dataset loaders edit add remove.

mapreduce code for movie review dataset

Similar Datasets

License edit, modalities edit, languages edit.

MapReduce a Comprehensive Review

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Implemented a MapReduce program in JAVA. It implements a logic to calculate the genres of movies from the IMDB database (0.5 million records) in all the years from 2001 - 2015.

harshchaludia/MapReduce-IMDB-Analysis

Folders and files, repository files navigation, map reduce imdb analysis.

In this project, we implemented a Map/Reduce program to find the number of movies with genre combinations of Comedy, Romance; Action, Thriller; and Adventure, Sci-Fi for the time periods [2001-2005], [2006-2010], and [2011-2015] using IMDB dataset. In the second part of the project, we write SQL queries to find the top 5 and bottom 5 movies of the one of the time periods and all the genre combinations. We also find the query plan using the ‘EXPLAIN PLAN’ command.

Getting Started

Below instructions will get you a version of the project up and running on your local machine for development and testing purposes. See the following, for notes on how to deploy the project on a live system.

Prerequisites

  • Ubuntu (Version 14.04 or greater) or any linux platforms
  • Hadoop 2.9.1 or greater
  • Atom (File editor)
  • Java 8 SE Development Toolkit
  • Install Virtual Box & download the ubuntu vdi image.
  • Proceed with the installation of hadoop single node cluster.
  • After completing the installation,
  • Start the hadoop daemons by typing the below command, and this starts all three nodes viz. namenode, datanode and secondary namenode. i.e start-dfs.sh start-yarn.sh
  • Hadoop uses HDFS file system. Hence, we first had to decode the file system of Hadoop.
  • Hadoop -dfs copyFromLocal imdb.txt
  • bin/hadoop com.sun.tools.javac.main imdb.java
  • jar cf imdb.jar imdb*.class
  • hadoop jar project/imdb.jar project.imdb /imdb/input /imdb/output

Running the tests

After successfull run of your hadoop program, go for downloading the output at "localhost:50070" in the browser.

  • Java 100.0%

IMAGES

  1. Log file MapReduce example

    mapreduce code for movie review dataset

  2. 10 MapReduce example Break down movie ratings by rating score

    mapreduce code for movie review dataset

  3. MapReduce job to get the most popular movie from a dataset [Exercise

    mapreduce code for movie review dataset

  4. GitHub

    mapreduce code for movie review dataset

  5. AlgoDaily

    mapreduce code for movie review dataset

  6. MapReduce Algorithm

    mapreduce code for movie review dataset

VIDEO

  1. SOURCE CODE MOVIE REVIEW

  2. Machine Learning Project

  3. Source code [ movie review

  4. IU X-Informatics Unit 26: K-means & MapReduce 2: MapReduce Kmeans in Python II

  5. Lec1 MapReduce Features

  6. Decoding MapReduce Concepts for Big Data Analytics

COMMENTS

  1. MapReduce job to get the most popular movie from a dataset ...

    This lecture is all about MapReduce Exercise to analyze movie ratings data to get the most popular movie from our movies data set. This lecture will cover ge...

  2. Analyzing MovieLens movie ratings with MapReduce

    How to run: Build a jar from the source files using the main() routine in MovieNamesRatings.java, e.g. MovieLensNameMapReduce.jar; Run the following commands:

  3. Big data (movie ratings) based on Hadoop and MapReduce

    Big data (movie ratings) based on Hadoop and MapReduce - prince6635/movie-ratings-by-mapreduce-and-hadoop. Skip to content. Toggle navigation. Sign in Product Actions. Automate any workflow ... Code review. Manage code changes Issues. Plan and track work ... datasets -> MovieLens 100K Dataset (ml-100k.zip) About. Big data (movie ratings) based ...

  4. Big Data MapReduce Movie Ratings Analysis

    This project focuses on leveraging the power of big data processing techniques using MapReduce, implemented in Python with the mrjob library, to analyze movie ratings data. Project Description This repository contains a collection of scripts designed to process and analyze large datasets of movie ratings.

  5. MapReduce

    On my smaller dataset, I was able to run the map reduce framework locally. For the larger one, I used Amazon's Elastic MapReduce to do my computations and save them on the cloud. Each of these sets had two data files: one for movies (which contained movie id and movie title) and one for ratings (which contained movie id, user id, and rating).

  6. A Recommendation System for Movies by Using Hadoop Mapreduce

    The aim of the research paper is to develop a movie recommender system using machine learning techniques and Hadoop MapReduce to process a large data set of movie ratings. The system will take into account various attributes and user reviews to generate personalized movie suggestions for users.

  7. A Recommendation System for Movies by Using Hadoop Mapreduce

    Using data from 105,494 online reviews from a popular website, IMDB.com, following 264 movie releases, the article shows that the impact of review style and content on new product sales is ...

  8. How to find the Top 10 most viewed movies with their movie name in

    I a movie dataset and a ratings dataset like this . movies.txt MovieID - Title - Genres ratings.txt UserID - MovieID - Rating - Timestamp I am trying to write a MR job which will find the top 10 rated movies with their name.I have written a job which is giving the movie name and view .Like this

  9. How to find top-N records using MapReduce

    Here, we will discuss the two methods to find top-N records as follows. Method 1: First, let's find out top-10 most viewed movies to understand the methods and then we will generalize it for 'n' records. Data format: movie_name and no_of_views (tab separated) Approach Used: Using TreeMap. Here, the idea is to use Mappers to find local top ...

  10. Analysing Movies Lens movies dataset with MapReduce

    About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket Press Copyright ...

  11. GitHub

    To find average rating of a genre for each profession and age group we have process all the three files i.e movie.dat, rating.dat and user.dat.. We use the concept of Job chaining in MapReduce for doing these multiple tasks.. We get movie_id (key) and genre (value) as output from movieDataMapper and movie_id (key) and concatenation of user_id+rating (value) from ratingDataMapper.

  12. MovieLens Dataset

    The MovieLens datasets, first released in 1998, describe people's expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to ...

  13. Movie Recommendations System(Spark, SQL with Python)

    Next, We will create 'movies' variable using the map() function and lambda method to pull out nothing but a list of 'MovieID' as key and assign each line's value as '1' so that we ...

  14. IMDb Movie Reviews Dataset

    The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10.

  15. IMDB Dataset of 50K Movie Reviews

    Large Movie Review Dataset. code. New Notebook. table_chart. New Dataset. tenancy. New Model. emoji_events. New Competition. corporate_fare. New Organization. No Active Events. Create notebooks and keep track of their status here. add New Notebook. auto_awesome_motion.

  16. Sentiment Analysis for Movies Reviews Dataset Using Deep ...

    Ali, Nehal Mohamed and Abd El Hamid, Marwa Mostafa and Youssif, Aliaa, Sentiment Analysis for Movies Reviews Dataset Using Deep Learning Models (June 14, 2019). International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.9, No.2/3, May 2019, ...

  17. Calculate the average rating for each movie using MapReduce

    The mapper code will read in the ratings.csv file and use the movieId as the key and the rating field as the value. The reducer code, will compute the average rating (as a double datatype) for each movieId and store it in the output directory on HDFS.

  18. Solved Write pseudo-code to solve the following problem

    Write pseudo-code to solve the following problem using MapReduce and explain how it works. Each line in the file lists a user ID, the ID of the movie the user watched, the rating the user gave for the movie, and the timestamp. ... find out the top similar movies for each movie. Hint: Similarity is defined as how similarly two movies are rated ...

  19. (PDF) SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET ...

    89.2% while CNN has given accuracy of 87.7%, while ML P and LST M have reported accuracy of 86.74%. and 86.64 respectively. M oreover, the results have elaborated that the proposed deep learn ing ...

  20. sankalpjain99/MovieLens-MapReduce-Analysis

    In this task, from the given dataset we'll find a distribution for the number of movies rated by a user - that is, the number of users who have rated 1 movie, 2 movies, 3 movies etc. This will give us an analysis that how many persons who rated the movie and based on the number of ratings he/she can estimate the content in the movie.

  21. MapReduce a Comprehensive Review

    MapReduce encompasses a framework in the processing and management of large scale datasets within a distributed cluster. The framework has been employed in several applications including search indexes generation, analysis of access log, document clustering, and other data analytics. A flexible computation model is adopted in MapReduce in addition to plain interface which comprises the ...

  22. harshchaludia/MapReduce-IMDB-Analysis

    In this project, we implemented a Map/Reduce program to find the number of movies with genre combinations of Comedy, Romance; Action, Thriller; and Adventure, Sci-Fi for the time periods [2001-2005], [2006-2010], and [2011-2015] using IMDB dataset.

  23. Movie Review Dataset

    Movie Review Dataset. Movie Review Dataset. code. New Notebook. table_chart. New Dataset. tenancy. New Model. emoji_events. New Competition. corporate_fare. New Organization. No Active Events. Create notebooks and keep track of their status here. add New Notebook. auto_awesome_motion.