
ICT606 Machine Learning Report Sample
This assignment includes two main components. In part one, you are asked to complete a small-scale literature review on a popular topic explainable machine learning. In part two, you will be expected to delve deeper into the topic by exploring and implementing relevant techniques on your selected public dataset, so that you can create a prediction model that can provide explanations for its decisions, thereby achieving explainability.
The aim of this assignment is to enable you to compare and contrast the latest advancements in machine learning and evaluate their effectiveness in the practical application, aligning with the learning objectives 3 & 4.
Report guidelines:
- Word limit: 2,500 words (can be within a +/- 10% range of this word limit), excluding references, figures, and tables. The report should be formatted in Times New Roman 12 font with normal margins selected
- Do not include any form of code snippets directly into the report. All code should be included solely in the Python files submitted.
- It’s important to focus on being clear and concise in your writing and make the maximum use of well-designed visualization to help convey information in a more efficient and impactful way.
The following report outline is suggested:
Introduction: Define the topic of your study and provide any relevant background information that helps your reader to understand the topic.
Small-scale literature review: You are required to review at least five computing JOURNAL articles published date after 2020 that focus on achieving explainable machine learning techniques for predictive modelling. Be sure that you have organized your selected articles into meaning ways. Give a summary of each article with a specific focus on the explainable machine learning techniques they used, explainability type achieved, advantages and weaknesses, etc. you can also compare, contrast and/or connect the articles you have selected.
Experimental design: select THREE appropriate explainable machine learning techniques and implement them on a public dataset you select; design your experiments by following the standard machine learning pipelines, such as data pre-processing, predictive model construction, performance evaluation, etc.
Experimental results and discussion: What are the findings of the experiments? What level of ‘explainability’ is obtained? How do you interpret the results? Engagingly present your results using text and visualizations.
Conclusion: Summarize the project and highlight any limitations or shortcomings of the project and suggest potential improvements for future work.
References: Include a listing of all references that you mentioned in your paper. Please use IEEE reference style when completing this list.
Solution
Introduction
In recent years, the utility of device mastering strategies in real property has received huge traction, imparting promising avenues for predicting assets values and informing investment selections. However, as system mastering models become an increasing number of complex, there's a growing want for transparency and interpretability in their selection-making methods. This is in particular essential in the area of real estate, wherein stakeholders require no longer best correct predictions however also insights into the factors using the ones predictions.
The concept of explainable device studying (XML) has emerged as a vital vicinity of research, aiming to enhance the transparency and interpretability of gadget studying fashions. In the context of real property predictive modeling, XML techniques enable stakeholders, which include buyers, agents, and policymakers, to recognize how predictions are generated and which capabilities make contributions most significantly to those predictions. This information is priceless for building trust within the predictive models and facilitating informed decision-making in real estate transactions [6].
This document makes a speciality of exploring and enforcing XML techniques for predictive modeling in real estate. We begin with the aid of imparting a small-scale literature evaluation, summarizing latest studies articles that highlight numerous XML strategies and their programs in real property predictive modeling. We then continue to layout and behavior experiments the usage of 3 decided on XML strategies on a public dataset. Through those experiments, we aim to evaluate the effectiveness of the strategies in attaining explainability whilst keeping predictive accuracy.
The final goal of this look at is to make contributions to the advancement of transparent and interpretable predictive modeling in real property, thereby empowering stakeholders with actionable insights for making knowledgeable choices inside the dynamic actual property marketplace.
Small-scale Literature Review
The literature on explainable device getting to know (XML) techniques for predictive modeling in actual property has witnessed substantial increase in recent years. In this phase, we evaluation 5 computing magazine articles published after 2020, that specialize in XML techniques and their programs in real estate predictive modeling. The decided on articles are prepared primarily based on the techniques used and the form of explainability accomplished.
2.1 Article 1: An Introduction to Machine Learning
- Summary: This article affords an introduction to gadget learning (ML) concepts and their packages, aiming to familiarize readers with foundational thoughts in ML. While the thing does now not delve into the information and theoretical historical past of ML, it goals to equip readers, especially within the pharmacometrics and clinical pharmacology community, with essential gear to understand ML publications. Although the point of interest of the object is extensive, it serves as a foundational piece for expertise ML concepts in numerous domain names, which includes actual estate predictive modeling [1].
- Explainability Type: N/A (Not without delay related to explainable device mastering techniques)
- Advantages: Provides a foundational knowledge of device getting to know standards.
- Weaknesses: May lack unique insights into explainable gadget mastering techniques for actual estate predictive modeling.
2.2 Article 2: AI-Based on Machine Learning Methods for Urban Real Estate Prediction: A Systematic Survey
- Summary:This systematic survey explores the software of artificial intelligence (AI) primarily based on device studying strategies for city actual property prediction. The digitization of city regions has led to the emergence of AI-primarily based decision help structures, leveraging numerous device getting to know algorithms for residence price predictions. The survey highlights the growing significance of predictive modeling in urban actual property funding, driven by means of technological breakthroughs and a developing market for smart and environmentally pleasant homes. While the look at reveals a considerable awareness of guides from digitized countries together with the united states, China, India, Japan, and Hong Kong, it additionally identifies demanding situations related to statistics size and the desire for simple ML methods over deep gaining knowledge of strategies. The authors emphasize the want for destiny research to cope with the explainability challenges of the fashions built, permitting higher control of studies and enterprise model layout inside the actual property area [2].
- Explainability Type: Global explainability
- Advantages: Provides insights into the utility of AI and ML in urban actual estate prediction.
- Weaknesses: Limited discussion on precise explainable system mastering strategies.
2.3 Article 3: Real Estate Price Estimation in French Cities using Geocoding and Machine Learning
- Summary: This paper opinions actual property rate estimation in France, focusing on the relevance of location capabilities in real estate price estimation. The take a look at compares seven popular gadget mastering strategies and proposes an approach that quantifies the relevance of region capabilities with excessive and excellent ranges of granularity. Using a newly to be had open dataset supplied by the French government, the look at demonstrates widespread enhancements in forecasting accuracy while geocoding functions are considered. The results highlight the overall performance of neural networks and random woodland strategies inside the absence of geocoding capabilities, at the same time as random forest, AdaBoost, and gradient boosting carry out well whilst geocoding capabilities are blanketed. The findings have implications for identifying possibilities within the real property marketplace and might function a basis for price evaluation in revenue control [3].
- Explainability Type: Global explainability
- Advantages: Provides insights into the importance of place features in actual property price estimation and demonstrates the impact of geocoding features on predictive modeling accuracy.
- Weaknesses: Limited dialogue on the interpretability of the machine mastering models used.
2.4 Article 4: Using Machine Learning Models and Actual Transaction Data for Predicting Real Estate Prices
- Summary: This have a look at employs real transaction data and system getting to know fashions to are expecting actual property charges. Four system mastering fashions, together with least squares support vector regression (LSSVR), classification and regression tree (CART), general regression neural networks (GRNN), and backpropagation neural networks (BPNN), are hired to forecast real estate costs. Numerical consequences indicate that LSSVR outperforms the other three models in terms of forecasting accuracy, presenting pretty aggressive and first-class consequences for actual estate rate prediction [4].
- Explainability Type: N/A (Not explicitly discussed within the summary)
- Advantages: Employs actual transaction information for actual estate price prediction.
- Weaknesses: Lack of debate at the interpretability and explainability of the device gaining knowledge of models used.
2.5 Article 5: Price Forecasting for Real Estate Using Machine Learning: A Case Study on Riyadh City
- Summary: This article focuses on the correct forecasting of real estate prices, emphasizing the significance of such predictions for financial boom and policy-making. The have a look at employs system gaining knowledge of algorithms, which includes selection trees, random woodland (RF), and linear regression, to broaden fashions for land rate estimation in Riyadh, KSA. Using information accrued from lands inside the northern area of Riyadh, the take a look at assesses the performance of the advanced models based totally on trendy overall performance metrics. The experiments reveal that the RF-based model outperforms the other fashions, imparting treasured insights for land fee estimation in Riyadh [5].
- Explainability Type: N/A (Not explicitly discussed within the abstract)
- Advantages: Provides insights into machine mastering-primarily based land price forecasting in Riyadh.
- Weaknesses: Lack of dialogue at the interpretability and explainability of the device learning fashions used.
2.6 Comment
In summary, the literature assessment for university assignment help segment offers a numerous exploration of latest research articles focusing on machine learning strategies for real property predictive modeling. Each article contributes particular insights into the software of system gaining knowledge of in one-of-a-kind contexts, ranging from urban actual property prediction to land rate forecasting in specific cities. While some research emphasize the significance of region functions and actual transaction information, others focus on comparing various gadget mastering algorithms for predictive accuracy. However, a commonplace theme across these articles is the confined discussion at the interpretability and explainability of the gadget mastering models used, highlighting a potential location for destiny research. Despite this trouble, the reviewed articles together make a contribution to advancing our information of system studying techniques in real property predictive modeling and offer valuable insights for practitioners and researchers within the discipline.
Experimental Design
In this section, lets outline the methodology for our study, including data preprocessing, model construction, and performance evaluation.
3.1 Data Preprocessing
Let begin by loading the dataset from the 'melb_data.csv' file using the Pandas library. The dataset contains information about properties in Melbourne, including features such as the number of rooms, bathrooms, landsize, latitude, longitude, and the target variable, price [7].
Next, lets remove the nan values from the dataset to ensure clean data.
The code defines the target variable 'y' as the price of the properties and select the features which is used for modeling, that includes 'Rooms', 'Bathroom', 'Landsize', 'Lattitude', and 'Longtitude'.
3.2 Model Construction
Here we will use Decision Tree as our model trainer to start our coding with. Sklearn Library will help us with that.
3.3 Performance Evaluation
Now, a testing will be done to ensure the quality of our model. The code will predict the price for 5 random houses.
3.4 Comment
This section provides the significance of design methodlogy for the project. Initially, we load the dataset from the 'melb_data.Csv' document the use of the Pandas library, ensuring statistics integrity. Rows with missing values are removed to maintain records pleasant. The goal variable, belongings price ('Price'), is described, in conjunction with decided on functions which include 'Rooms', 'Bathroom', 'Landsize', 'Lattitude', and 'Longitude', which serve as enter for predictive modeling.
For version creation, we opt for the Decision Tree Regressor due to its simplicity and interpretability. The version is instantiated and educated on the chosen functions and goal variable. The use of the random_state parameter ensures reproducibility of results throughout multiple runs.
In the overall performance evaluation phase, predictions are made on a subset of the dataset, and the actual expenses are compared with the version's predictions. This lets in for an preliminary assessment of the model's effectiveness in taking pictures the relationships among features and assets charges. However, in addition assessment on a larger check set is vital to completely gauge the version's predictive overall performance and generalizability. Overall, this experimental layout provides a dependent technique to exploring predictive modeling in actual property the use of gadget getting to know techniques [8].
Experimental Results and Discussion
In this segment, we present the findings of our experiments and speak their implications for actual estate predictive modeling using gadget getting to know strategies.
4.1 Model Performance Evaluation
To determine the performance of our Decision Tree Regressor model, we made predictions on a subset of the dataset and compared them with the actual expenses of the properties. Here are the results:
The version predicts assets expenses for the given functions, with the predictions intently aligned with the actual fees. However, in addition evaluation on a bigger test set is vital to absolutely verify the model's generalizability and overall performance on unseen records.
4.2 Model Accuracy
To quantitatively examine the accuracy of the version, we calculate the Mean Absolute Error (MAE) between the expected charges and the real fees. Lower MAE values indicate higher version overall performance.
The MAE provides a degree of the average absolute distinction between the predicted and actual costs. This metric lets in us to evaluate the model's accuracy and pick out areas for improvement in predictive performance.
4.3 Visualization
We visualize the predicted costs in opposition to the real prices the use of a scatter plot. Additionally, we present a histogram to visualise the distribution of the errors (residuals) between the anticipated and real fees [9].
the chart:
Insights
It is a line graph showing a evaluation of real expenses and anticipated expenses. The x-axis is labeled "Actual Price" and the y-axis is classified "Predicted Price". The scale on the x-axis goes from zero to one observed via an exponent of 6, that is likely 1,000,000. The scale on the y-axis goes from 0 to eight.
The identify of the chart is "Actual vs Predicted Prices". There are information points plotted at the chart, but there is no legend to suggest what the statistics factors represent. Without a legend, it's far hard to say for sure what the data points constitute. However, we can draw a few popular conclusions from the chart.
The line on the chart slopes downward from left to proper. This indicates that the anticipated expenses are typically higher than the actual costs. The farther a records point is beneath the line, the more the distinction between the expected rate and the real charge.
For example, if a statistics factor is plotted at a vicinity wherein the x-axis cost is 2 and the y-axis fee is 4, this would mean that the real fee became 2 and the expected fee was four.
4.4 Discussion
The findings endorse that the Decision Tree Regressor version shows promise in predicting real estate charges primarily based on features consisting of the wide variety of rooms, lavatories, landsize, range, and longitude. However, it's miles crucial to renowned that these results are based on a confined subset of the dataset and won't absolutely represent the model's overall performance on unseen statistics.
Future research could focus on the following areas:
- Feature Engineering: Exploring extra functions or alterations that could enhance the model's predictive performance.
- Model Evaluation: Conducting go-validation and performance evaluation on a separate check set to validate the version's effectiveness.
- Model Interpretability: Investigating strategies to enhance the interpretability of the version, permitting stakeholders to recognize the factors influencing property charges.
Overall, these findings provide precious insights into the application of gadget gaining knowledge of strategies for real property predictive modeling. Further studies and experimentation are had to expand extra sturdy and interpretable models for this domain.
Conclusion
In this take a look at, we explored the software of device learning techniques for real estate predictive modeling the usage of a Decision Tree Regressor model. Our research yielded valuable insights into the version's performance and potential implications for the real estate enterprise.
Key Findings
- The mode uses properties like longitude, latitude to predict the price of the house. The mode
- The version's predictions carefully aligned with the actual prices, indicating its ability software in helping real estate stakeholders in making informed selections.
- The Mean Absolute Error (MAE) metric furnished a quantitative degree of the version's accuracy, with lower MAE values indicating higher predictive overall performance [10].
Implications and Future Directions
- The successful utility of system studying strategies in actual property predictive modeling opens up opportunities for reinforcing selection-making processes within the industry.
- Future studies should focus on incorporating additional capabilities, exploring superior gadget studying algorithms, and enhancing version interpretability to in addition enhance predictive accuracy and facilitate stakeholder expertise.
- Collaboration between information scientists, real estate specialists, and policymakers is vital to leverage machine gaining knowledge of era efficaciously and ethically inside the real property area.
Limitations
- Our have a look at become confined by means of using a unmarried machine studying model and dataset. Further experimentation with numerous fashions and datasets should provide a greater complete understanding of predictive modeling in real property.
- The interpretability of the Decision Tree Regressor version may be restricted, necessitating the exploration of strategies to enhance version explainability for stakeholders.
References
[1] A. Louati, R. Lahyani, A. Aldaej, A. Aldumaykhi, and S. Otai, “Price forecasting for real estate using machine learning: A case study on Riyadh city,” Concurrency and Computation: Practice and Experience, vol. 34, no. 6, Dec. 2021, doi: https://doi.org/10.1002/cpe.6748.
[2] P.-F. Pai and W.-C. Wang, “Using Machine Learning Models and Actual Transaction Data for Predicting Real Estate Prices,” Applied Sciences, vol. 10, no. 17, p. 5832, Aug. 2020, doi: https://doi.org/10.3390/app10175832.
[3] D. Tchuente and S. Nyawa, “Real estate price estimation in French cities using geocoding and machine learning,” Annals of Operations Research, Feb. 2021, doi: https://doi.org/10.1007/s10479-021-03932-5.
[4] Stéphane C. K. Tekouabou, ?tefan Cristian Gherghina, Eric Désiré Kameni, Youssef Filali, and Khalil Idrissi Gartoumi, “AI-Based on Machine Learning Methods for Urban Real Estate Prediction: A Systematic Survey,” Archives of Computational Methods in Engineering, Oct. 2023, doi: https://doi.org/10.1007/s11831-023-10010-5.
[5] S. Badillo et al., “An Introduction to Machine Learning,” Clinical Pharmacology & Therapeutics, vol. 107, no. 4, pp. 871–885, Mar. 2020, doi: https://doi.org/10.1002/cpt.1796.
[6] A. Jung, Machine Learning: The Basics. Springer Nature, 2022. Accessed: Apr. 11, 2024. [Online]. Available: https://books.google.com/books?hl=en&lr=&id=1IBaEAAAQBAJ&oi=fnd&pg=PR8&dq=machine+learning&ots=XT7KElmFUZ&sig=QvuYVet2v
_ijBTauOQ2fZ8So1Yw.
[7] B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021
[8] M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decision Analytics Journal, vol. 3, p. 100071, Jun. 2022, doi: https://doi.org/10.1016/j.dajour.2022.100071.
[9] S. Raschka, J. Patterson, and C. Nolet, “Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence,” Information, vol. 11, no. 4, p. 193, Apr. 2020, Available: https://www.mdpi.com/2078-2489/11/4/193.
[10] A. Grybauskas, V. Pilinkien?, and A. Stundžien?, “Predictive analytics using Big Data for the real estate market during the COVID-19 pandemic,” Journal of Big Data, vol. 8, no. 1, Aug. 2021, doi: https://doi.org/10.1186/s40537-021-00476-0.