Skip to content

Commit 162fd5d

Browse files
Add files via upload
1 parent 9aeb753 commit 162fd5d

File tree

1 file changed

+256
-0
lines changed

1 file changed

+256
-0
lines changed

2. Moive Recommendation.ipynb

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Moive Recommendation"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"## This project is to practice data structures, methods and functions of the Pandas and Numpy"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"The goal of the project is to create movie recommendations for a person, based on the person’s and critics’ ratings of the movies. \n",
22+
"\n",
23+
"The following files will be required to run the program:\n",
24+
"1. `IMDB.csv`: A table with movie information\n",
25+
"2. `ratings.csv`: A table with ratings of all movies listed in the movies data \n",
26+
" by 100 critics. The column names in the critics data correspond to the name of each critic.\n",
27+
"3. `pX.csv`: A table with one person’s ratings of a subset of the movies in the movies data set, \n",
28+
" where X is a number. The column name in the file indicates the name of the person.\n",
29+
" \n",
30+
" \n",
31+
"All personal ratings are integer numbers in the 1..10 range."
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"metadata": {},
37+
"source": [
38+
"** How does this program function:** <br>\n",
39+
"1. The user will be asked to specify the `subfolder` in the current working directory, where the files are stored, along with the `names of the critics`, `person` and `movies data files`.\n",
40+
"2. Determine and output the names of three critics, whose ratings of the movies are closest to the person’s ratings based on the `Euclidean distance` metric.\n",
41+
"3. Use the `ratings by the critics` identified in item 2 to determine which movies to recommend. Display information about recommended movies as described below.<br>\n",
42+
"a. The movie recommendations must consist of the top-rated movies in each movie genre, based on the average ratings of movies by the three critics identified in step 2 above.<br>\n",
43+
"b. Movie genre is determined by the Genre1 column of the movies data.<br>\n",
44+
"c. Recommendations must be listed in alphabetical order by genre.<br>\n",
45+
"d. Missing data (e.g. running time) should not be included."
46+
]
47+
},
48+
{
49+
"cell_type": "code",
50+
"execution_count": 1,
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"import os.path\n",
55+
"import pandas as pd\n",
56+
"import numpy as np\n",
57+
"\n",
58+
"def main():\n",
59+
" '''\n",
60+
" The main function that is called to start the program. \n",
61+
" '''\n",
62+
" filesNames = input('Please enter the name of the folder with files, the name of movies file,\\\n",
63+
" \\nthe name of critics file, the name of personal ratings file, separated by spaces:\\n')\n",
64+
" print() #print a new line\n",
65+
" filesNamesLst = filesNames.split(' ') \n",
66+
" currentWorkDir = os.getcwd()\n",
67+
" subfolderName = filesNamesLst[0]\n",
68+
" #create a DataFrame for movies with selected columns\n",
69+
" movieFileName = filesNamesLst[1] \n",
70+
" movieFilePath = os.path.join(currentWorkDir, subfolderName, movieFileName)\n",
71+
" movieDataFrame = pd.read_csv(movieFilePath, \\\n",
72+
" encoding = 'unicode_escape').loc[:, ['Title', 'Genre1', 'Year', 'Runtime']] \n",
73+
" #create a DataFrame for critics ratings\n",
74+
" criticsFileName = filesNamesLst[2] \n",
75+
" criticsFilePath = os.path.join(currentWorkDir, subfolderName, criticsFileName)\n",
76+
" criticsDataFrame = pd.read_csv(criticsFilePath) \n",
77+
" #create a DataFrame for personal ratings\n",
78+
" personalFileName = filesNamesLst[3] \n",
79+
" personalFilePath = os.path.join(currentWorkDir, subfolderName, personalFileName)\n",
80+
" personalDataFrame = pd.read_csv(personalFilePath) \n",
81+
" #call functions to run the program\n",
82+
" topThreeCriticsLst = findClosestCritics(criticsDataFrame, personalDataFrame) \n",
83+
" print(topThreeCriticsLst, '\\n') \n",
84+
" movieRecommendation = recommendMovies(criticsDataFrame, personalDataFrame, \\\n",
85+
" topThreeCriticsLst, movieDataFrame)\n",
86+
" personName = personalDataFrame.columns[1]\n",
87+
" printRecommendations(movieRecommendation, personName)"
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": 2,
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"def findClosestCritics(criticsDataFrame, personalDataFrame):\n",
97+
" '''\n",
98+
" This function is to return a list of three critics, whose ratings of movies are most similar \n",
99+
" to those provided in the personal ratings data based on Euclidean distance. The lower the \n",
100+
" distance, the closer, thus more similar, the critic's ratings are to the person's. \n",
101+
" \n",
102+
" Parameters:\n",
103+
" criticsDataFrame - provides data about critics ratings\n",
104+
" personalDataFrame - provides data about personal ratings \n",
105+
" '''\n",
106+
" \n",
107+
" # merge critics file and personal file by the same movie title\n",
108+
" criticsPersonRating = pd.merge(criticsDataFrame, personalDataFrame) \n",
109+
" # a new DataFrame with only critics' ratings after merging without Title column\n",
110+
" criticRating = criticsPersonRating.iloc[:,1:-1] \n",
111+
" # indexed by the movie titles\n",
112+
" criticRating.index = criticsPersonRating['Title'] \n",
113+
" # person's rating value without the person's name\n",
114+
" personRatingValue = criticsPersonRating[personalDataFrame.columns[1]] \n",
115+
" # to keep the index the same as the critics' rating DataFrame \n",
116+
" personRatingValue.index = criticsPersonRating['Title'] \n",
117+
" ratingDifference = criticRating.sub(personRatingValue, axis = 0)\n",
118+
" eucliDistance = np.sqrt((ratingDifference**2).apply(np.sum))\n",
119+
" eucliDistance.sort_values(inplace = True) # sort the result from smallest to largest\n",
120+
" # select only the top 3 critics with smaller Euclidean distance \n",
121+
" topThreeCritics = eucliDistance.iloc[:3] \n",
122+
" # generate a list of the critics' names\n",
123+
" topThreeCriticsLst = list(topThreeCritics.index.values) \n",
124+
" \n",
125+
" return topThreeCriticsLst"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": 3,
131+
"metadata": {},
132+
"outputs": [],
133+
"source": [
134+
"def recommendMovies(criticsDataFrame, personalDataFrame, topThreeCriticsLst, movieDataFrame): \n",
135+
" '''\n",
136+
" This function is to compute the top-rated unwatched movies in each genre category \n",
137+
" based on the average of the three critics' ratings\n",
138+
" \n",
139+
" Parameters:\n",
140+
" criticsDataFrame - provides data about critics' ratings\n",
141+
" personalDataFrame - provides data about personal ratings \n",
142+
" topThreeCriticsLst - a list of three critics, whose ratings of movies are most similar to \n",
143+
" those provided in the personal ratings data\n",
144+
" movieDataFrame - provides data about movies info\n",
145+
" '''\n",
146+
" # prepare the DataFrames for critics rating, person's rating and movie indexed by movie title.\n",
147+
" criticsDataFrame.index = criticsDataFrame['Title']\n",
148+
" criticsDataFrame = criticsDataFrame.iloc[:,1:]\n",
149+
" personalDataFrame.index = personalDataFrame['Title']\n",
150+
" personalDataFrame = personalDataFrame.iloc[:,1:]\n",
151+
" movieDataFrame.index = movieDataFrame['Title']\n",
152+
" movieDataFrame = movieDataFrame.iloc[:,1:]\n",
153+
" # prepare the unwatched movie DataFrame with average ratings \n",
154+
" # from the three critics whose ratings are similar to the person's\n",
155+
" unwatchedCriticRating = criticsDataFrame.loc\\\n",
156+
" [criticsDataFrame.index.difference(personalDataFrame.index)]\n",
157+
" topThreeCriticsRating = unwatchedCriticRating[topThreeCriticsLst]\n",
158+
" averageCriticsRating = round(topThreeCriticsRating.mean(axis = 1), 2)\n",
159+
" movieDataFrame['Average Rating'] = averageCriticsRating \n",
160+
" movieDataFrame.sort_values('Genre1', inplace = True)\n",
161+
" movieRecommendation = movieDataFrame[movieDataFrame.groupby(by = 'Genre1')['Average Rating'].\\\n",
162+
" transform(max) == movieDataFrame['Average Rating']]\n",
163+
" \n",
164+
" return movieRecommendation"
165+
]
166+
},
167+
{
168+
"cell_type": "code",
169+
"execution_count": 4,
170+
"metadata": {},
171+
"outputs": [],
172+
"source": [
173+
"def printRecommendations(movieRecommendation, personName):\n",
174+
" '''\n",
175+
" This function is to printout all the recommended movies in alphabetical order by the genre.\n",
176+
" \n",
177+
" Parameters:\n",
178+
" movieRecommendation - provides data about critics' ratings\n",
179+
" personName - the person's name for whom the recommendation is made for\n",
180+
" '''\n",
181+
" print('Recommendations for ', personName, ':', sep = '')\n",
182+
" # get the longest title for formatting later\n",
183+
" moiveTitle = list(movieRecommendation.index.values)\n",
184+
" longestTitle = len(max(moiveTitle, key = len))\n",
185+
" # get each factor (i.e. title, genre etc.) and then print with designed format \n",
186+
" for row in range(len(movieRecommendation)):\n",
187+
" title = movieRecommendation.index[row]\n",
188+
" gener1 = movieRecommendation.loc[title]['Genre1']\n",
189+
" year = movieRecommendation.loc[title]['Year']\n",
190+
" runTime = movieRecommendation.loc[title]['Runtime']\n",
191+
" rating = movieRecommendation.loc[title]['Average Rating']\n",
192+
" if pd.isnull(runTime) != True:\n",
193+
" print('\"', title, '\" ', (longestTitle - len(title))*' ', '(', gener1, '), ', \\\n",
194+
" 'rating: ', rating, ', ', year, ', runs ', runTime, sep = '')\n",
195+
" else:\n",
196+
" print('\"', title, '\" ', (longestTitle - len(title))*' ', \\\n",
197+
" '(', gener1, '), ', 'rating: ', rating, ', ', year, sep = '')"
198+
]
199+
},
200+
{
201+
"cell_type": "code",
202+
"execution_count": 5,
203+
"metadata": {},
204+
"outputs": [
205+
{
206+
"name": "stdout",
207+
"output_type": "stream",
208+
"text": [
209+
"Please enter the name of the folder with files, the name of movies file, \n",
210+
"the name of critics file, the name of personal ratings file, separated by spaces:\n",
211+
"data1 IMDB.csv ratings.csv p8.csv\n",
212+
"\n",
213+
"['Quartermaine', 'Arvon', 'Merrison'] \n",
214+
"\n",
215+
"Recommendations for Catulpa:\n",
216+
"\"Star Wars: The Force Awakens\" (Action), rating: 9.67, 2015, runs 136 min\n",
217+
"\"The Grand Budapest Hotel\" (Adventure), rating: 9.0, 2014, runs 99 min\n",
218+
"\"The Martian\" (Adventure), rating: 9.0, 2015, runs 144 min\n",
219+
"\"Kubo and the Two Strings\" (Animation), rating: 9.67, 2016\n",
220+
"\"How to Train Your Dragon\" (Animation), rating: 9.67, 2010\n",
221+
"\"Hacksaw Ridge\" (Biography), rating: 9.33, 2016, runs 139 min\n",
222+
"\"What We Do in the Shadows\" (Comedy), rating: 9.0, 2014\n",
223+
"\"Prisoners\" (Crime), rating: 8.33, 2013, runs 153 min\n",
224+
"\"Spotlight\" (Crime), rating: 8.33, 2015, runs 128 min\n",
225+
"\"The Perks of Being a Wallflower\" (Drama), rating: 9.67, 2012, runs 102 min\n",
226+
"\"Shutter Island\" (Mystery), rating: 8.33, 2010, runs 138 min\n"
227+
]
228+
}
229+
],
230+
"source": [
231+
"main()"
232+
]
233+
}
234+
],
235+
"metadata": {
236+
"kernelspec": {
237+
"display_name": "Python 3",
238+
"language": "python",
239+
"name": "python3"
240+
},
241+
"language_info": {
242+
"codemirror_mode": {
243+
"name": "ipython",
244+
"version": 3
245+
},
246+
"file_extension": ".py",
247+
"mimetype": "text/x-python",
248+
"name": "python",
249+
"nbconvert_exporter": "python",
250+
"pygments_lexer": "ipython3",
251+
"version": "3.6.5"
252+
}
253+
},
254+
"nbformat": 4,
255+
"nbformat_minor": 2
256+
}

0 commit comments

Comments
 (0)