Skip to content

Commit b44ec4d

Browse files
Add files via upload
1 parent 2dd8f1c commit b44ec4d

File tree

1 file changed

+254
-0
lines changed

1 file changed

+254
-0
lines changed

2.Moive Recommendation.ipynb

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Moive Recommendation"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"## This project is to practice data structures, methods and functions of the Pandas and Numpy"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"The goal of the project is to create movie recommendations for a person, based on the person’s and critics’ ratings of the movies. \n",
22+
"\n",
23+
"The following files will be required to run the program:\n",
24+
"1. `IMDB.csv`: A table with movie information\n",
25+
"2. `ratings.csv`: A table with ratings of all movies listed in the movies data \n",
26+
" by 100 critics. The column names in the critics data correspond to the name of each critic.\n",
27+
"3. `pX.csv`: A table with one person’s ratings of a subset of the movies in the movies data set, \n",
28+
" where X is a number. The column name in the file indicates the name of the person.\n",
29+
" \n",
30+
" \n",
31+
"All personal ratings are integer numbers in the 1..10 range."
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"metadata": {},
37+
"source": [
38+
"** How does this program function:** <br>\n",
39+
"1. The user will be asked to specify the `subfolder` in the current working directory, where the files are stored, along with the `names of the critics`, `person` and `movies data files`.\n",
40+
"2. Determine and output the names of three critics, whose ratings of the movies are closest to the person’s ratings based on the `Euclidean distance` metric.\n",
41+
"3. Use the `ratings by the critics` identified in item 2 to determine which movies to recommend. Display information about recommended movies as described below.<br>\n",
42+
"a. The movie recommendations must consist of the top-rated movies in each movie genre, based on the average ratings of movies by the three critics identified in step 2 above.<br>\n",
43+
"b. Movie genre is determined by the Genre1 column of the movies data.<br>\n",
44+
"c. Recommendations must be listed in alphabetical order by genre.<br>\n",
45+
"d. Missing data (e.g. running time) should not be included."
46+
]
47+
},
48+
{
49+
"cell_type": "code",
50+
"execution_count": 1,
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"import os.path\n",
55+
"import pandas as pd\n",
56+
"import numpy as np\n",
57+
"\n",
58+
"def main():\n",
59+
" '''\n",
60+
" The main function that is called to start the program. \n",
61+
" '''\n",
62+
" filesNames = input('Please enter the name of the folder with files, the name of movies file,\\\n",
63+
" \\nthe name of critics file, the name of personal ratings file, separated by spaces:\\n')\n",
64+
" print() #print a new line\n",
65+
" filesNamesLst = filesNames.split(' ') \n",
66+
" currentWorkDir = os.getcwd()\n",
67+
" subfolderName = filesNamesLst[0]\n",
68+
" #create a DataFrame for movies with selected columns\n",
69+
" movieFileName = filesNamesLst[1] \n",
70+
" movieFilePath = os.path.join(currentWorkDir, subfolderName, movieFileName)\n",
71+
" movieDataFrame = pd.read_csv(movieFilePath, \\\n",
72+
" encoding = 'unicode_escape').loc[:, ['Title', 'Genre1', 'Year', 'Runtime']] \n",
73+
" #create a DataFrame for critics ratings\n",
74+
" criticsFileName = filesNamesLst[2] \n",
75+
" criticsFilePath = os.path.join(currentWorkDir, subfolderName, criticsFileName)\n",
76+
" criticsDataFrame = pd.read_csv(criticsFilePath) \n",
77+
" #create a DataFrame for personal ratings\n",
78+
" personalFileName = filesNamesLst[3] \n",
79+
" personalFilePath = os.path.join(currentWorkDir, subfolderName, personalFileName)\n",
80+
" personalDataFrame = pd.read_csv(personalFilePath) \n",
81+
" #call functions to run the program\n",
82+
" topThreeCriticsLst = findClosestCritics(criticsDataFrame, personalDataFrame) \n",
83+
" print(topThreeCriticsLst, '\\n') \n",
84+
" movieRecommendation = recommendMovies(criticsDataFrame, personalDataFrame, \\\n",
85+
" topThreeCriticsLst, movieDataFrame)\n",
86+
" personName = personalDataFrame.columns[1]\n",
87+
" printRecommendations(movieRecommendation, personName)"
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": 2,
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"def findClosestCritics(criticsDataFrame, personalDataFrame):\n",
97+
" '''\n",
98+
" This function is to return a list of three critics, whose ratings of movies are most similar to \n",
99+
" those provided in the personal ratings data based on Euclidean distance. The lower the distance, \n",
100+
" the closer, thus more similar, the critic's ratings are to the person's. \n",
101+
" \n",
102+
" Parameters:\n",
103+
" criticsDataFrame - provides data about critics ratings\n",
104+
" personalDataFrame - provides data about personal ratings \n",
105+
" '''\n",
106+
" \n",
107+
" # merge critics file and personal file by the same movie title\n",
108+
" criticsPersonRating = pd.merge(criticsDataFrame, personalDataFrame) \n",
109+
" # a new DataFrame with only critics' ratings after merging without Title column\n",
110+
" criticRating = criticsPersonRating.iloc[:,1:-1] \n",
111+
" # indexed by the movie titles\n",
112+
" criticRating.index = criticsPersonRating['Title'] \n",
113+
" # person's rating value without the person's name\n",
114+
" personRatingValue = criticsPersonRating[personalDataFrame.columns[1]] \n",
115+
" # to keep the index the same as the critics' rating DataFrame \n",
116+
" personRatingValue.index = criticsPersonRating['Title'] \n",
117+
" ratingDifference = criticRating.sub(personRatingValue, axis = 0)\n",
118+
" eucliDistance = np.sqrt((ratingDifference**2).apply(np.sum))\n",
119+
" eucliDistance.sort_values(inplace = True) # sort the result from smallest to largest\n",
120+
" # select only the top 3 critics with smaller Euclidean distance \n",
121+
" topThreeCritics = eucliDistance.iloc[:3] \n",
122+
" topThreeCriticsLst = list(topThreeCritics.index.values) # generate a list of the critics' names\n",
123+
" \n",
124+
" return topThreeCriticsLst"
125+
]
126+
},
127+
{
128+
"cell_type": "code",
129+
"execution_count": 3,
130+
"metadata": {},
131+
"outputs": [],
132+
"source": [
133+
"def recommendMovies(criticsDataFrame, personalDataFrame, topThreeCriticsLst, movieDataFrame): \n",
134+
" '''\n",
135+
" This function is to compute the top-rated unwatched movies in each genre category \n",
136+
" based on the average of the three critics' ratings\n",
137+
" \n",
138+
" Parameters:\n",
139+
" criticsDataFrame - provides data about critics' ratings\n",
140+
" personalDataFrame - provides data about personal ratings \n",
141+
" topThreeCriticsLst - a list of three critics, whose ratings of movies are most similar to \n",
142+
" those provided in the personal ratings data\n",
143+
" movieDataFrame - provides data about movies info\n",
144+
" '''\n",
145+
" # prepare the DataFrames for critics rating, person's rating and movie indexed by movie title.\n",
146+
" criticsDataFrame.index = criticsDataFrame['Title']\n",
147+
" criticsDataFrame = criticsDataFrame.iloc[:,1:]\n",
148+
" personalDataFrame.index = personalDataFrame['Title']\n",
149+
" personalDataFrame = personalDataFrame.iloc[:,1:]\n",
150+
" movieDataFrame.index = movieDataFrame['Title']\n",
151+
" movieDataFrame = movieDataFrame.iloc[:,1:]\n",
152+
" # prepare the unwatched movie DataFrame with average ratings \n",
153+
" # from the three critics whose ratings are similar to the person's\n",
154+
" unwatchedCriticRating = criticsDataFrame.loc[criticsDataFrame.index.difference(personalDataFrame.index)]\n",
155+
" topThreeCriticsRating = unwatchedCriticRating[topThreeCriticsLst]\n",
156+
" averageCriticsRating = round(topThreeCriticsRating.mean(axis = 1), 2)\n",
157+
" movieDataFrame['Average Rating'] = averageCriticsRating \n",
158+
" movieDataFrame.sort_values('Genre1', inplace = True)\n",
159+
" movieRecommendation = movieDataFrame[movieDataFrame.groupby(by = 'Genre1')['Average Rating'].\\\n",
160+
" transform(max) == movieDataFrame['Average Rating']]\n",
161+
" \n",
162+
" return movieRecommendation"
163+
]
164+
},
165+
{
166+
"cell_type": "code",
167+
"execution_count": 4,
168+
"metadata": {},
169+
"outputs": [],
170+
"source": [
171+
"def printRecommendations(movieRecommendation, personName):\n",
172+
" '''\n",
173+
" This function is to printout all the recommended movies in alphabetical order by the genre.\n",
174+
" \n",
175+
" Parameters:\n",
176+
" movieRecommendation - provides data about critics' ratings\n",
177+
" personName - the person's name for whom the recommendation is made for\n",
178+
" '''\n",
179+
" print('Recommendations for ', personName, ':', sep = '')\n",
180+
" # get the longest title for formatting later\n",
181+
" moiveTitle = list(movieRecommendation.index.values)\n",
182+
" longestTitle = len(max(moiveTitle, key = len))\n",
183+
" # get each factor (i.e. title, genre etc.) and then print with designed format \n",
184+
" for row in range(len(movieRecommendation)):\n",
185+
" title = movieRecommendation.index[row]\n",
186+
" gener1 = movieRecommendation.loc[title]['Genre1']\n",
187+
" year = movieRecommendation.loc[title]['Year']\n",
188+
" runTime = movieRecommendation.loc[title]['Runtime']\n",
189+
" rating = movieRecommendation.loc[title]['Average Rating']\n",
190+
" if pd.isnull(runTime) != True:\n",
191+
" print('\"', title, '\" ', (longestTitle - len(title))*' ', \\\n",
192+
" '(', gener1, '), ', 'rating: ', rating, ', ', year, ', runs ', runTime, sep = '')\n",
193+
" else:\n",
194+
" print('\"', title, '\" ', (longestTitle - len(title))*' ', \\\n",
195+
" '(', gener1, '), ', 'rating: ', rating, ', ', year, sep = '')"
196+
]
197+
},
198+
{
199+
"cell_type": "code",
200+
"execution_count": 5,
201+
"metadata": {},
202+
"outputs": [
203+
{
204+
"name": "stdout",
205+
"output_type": "stream",
206+
"text": [
207+
"Please enter the name of the folder with files, the name of movies file, \n",
208+
"the name of critics file, the name of personal ratings file, separated by spaces:\n",
209+
"data1 IMDB.csv ratings.csv p8.csv\n",
210+
"\n",
211+
"['Quartermaine', 'Arvon', 'Merrison'] \n",
212+
"\n",
213+
"Recommendations for Catulpa:\n",
214+
"\"Star Wars: The Force Awakens\" (Action), rating: 9.67, 2015, runs 136 min\n",
215+
"\"The Grand Budapest Hotel\" (Adventure), rating: 9.0, 2014, runs 99 min\n",
216+
"\"The Martian\" (Adventure), rating: 9.0, 2015, runs 144 min\n",
217+
"\"Kubo and the Two Strings\" (Animation), rating: 9.67, 2016\n",
218+
"\"How to Train Your Dragon\" (Animation), rating: 9.67, 2010\n",
219+
"\"Hacksaw Ridge\" (Biography), rating: 9.33, 2016, runs 139 min\n",
220+
"\"What We Do in the Shadows\" (Comedy), rating: 9.0, 2014\n",
221+
"\"Prisoners\" (Crime), rating: 8.33, 2013, runs 153 min\n",
222+
"\"Spotlight\" (Crime), rating: 8.33, 2015, runs 128 min\n",
223+
"\"The Perks of Being a Wallflower\" (Drama), rating: 9.67, 2012, runs 102 min\n",
224+
"\"Shutter Island\" (Mystery), rating: 8.33, 2010, runs 138 min\n"
225+
]
226+
}
227+
],
228+
"source": [
229+
"main()"
230+
]
231+
}
232+
],
233+
"metadata": {
234+
"kernelspec": {
235+
"display_name": "Python 3",
236+
"language": "python",
237+
"name": "python3"
238+
},
239+
"language_info": {
240+
"codemirror_mode": {
241+
"name": "ipython",
242+
"version": 3
243+
},
244+
"file_extension": ".py",
245+
"mimetype": "text/x-python",
246+
"name": "python",
247+
"nbconvert_exporter": "python",
248+
"pygments_lexer": "ipython3",
249+
"version": "3.6.5"
250+
}
251+
},
252+
"nbformat": 4,
253+
"nbformat_minor": 2
254+
}

0 commit comments

Comments
 (0)