Why we need Machine Learning for web development:
- In recent times, exponential growth in Artificial intelligence and machine learning has made humans think about applying AI to every possible area of human life.
- In many generic applications that help to make human life more manageable, artificial intelligence and machine learning have a huge role.
- Considering the number of applications Artificial intelligence and machine learning have, we can think about applying AI and ML to one of the most worldwide used things over the internet: Websites.
- In recent times the web has evolved exponentially. Billions of websites, let it be e-commerce, advertising, entertainment, and many more websites, are already working at their best to provide the service to the users. Still, we can make it further optimized for the owner and consumer if we can scale the performance up and make the experience smoother by using AI algorithms.
- In ML, data collection is very crucial. Whether that can be synthetic data or real-world data, this data will help apply algorithms related to ML.
Use of Machine Learning in our project:
- Here, we have used machine learning for blog recommendation purposes. As we know nowadays many e-commerce websites like Amazon, Flipkart, Nykaa and many more are using recommendation systems for ease of use which provides relevant results to users and engages user activities.
- To fit a model in a website we can use either javascript or python. To implement it using javascript we can either use Tensorflow inbuilt library or develop a custom model according to our requirement and deploy it using the node js framework. To implement it using python we can use supervised or unsupervised models to train models and deploy them using either flask or Django framework.
- To fit models into Jekyll, from the above two options we have selected python models like SVD, KNN, Cosine similarity, Naive bias, etc with the flask framework of python.
What we studied:
We explored following built-in web frameworks to build our website.
- Dgraph: It is a distributed graph database that provides horizontal scalability, distributed cluster-wide ACID transactions, low-latency arbitrary-depth joins, synchronous replication, high availability and crash resilience.
- Hugo: It is a web framework for creating a static website. basically, using the Hugo framework, we can build the static web pages, and for a dynamic content update, Hugo makes a new page for updated content. Websites that are built with Hugo are fast and secure. Hugo framework allows any database connectivity like firebase, google cloud storage, amazon s3, and CloudFront.Hugo accepts a directory of files and templates as input and uses it to build a whole website. Hugo lets you export your content in various formats, including JSON and AMP, and makes it simple to develop your own.
- Gatsby:It is a website generator based on react framework and Webpack and GraphQL technology are used.Gatsby features a common data layer that makes it simple for developers to mix data from various sources and present it side by side.Gatsby framework compile and build more quickly so it can build fastest website on the web.
- Jekyll: Jekyll software is a static site generator based on the Ruby language. Jekyll could be used to create web pages with extensive and intuitive navigation and Jekyll generates pages dynamically as it does not use the database to create pages and content from JSON files can be imported into Jekyll. To generate templates, Jekyll has used the Liquid rendering languages. Jekyll is suitable for people who want to keep all of their content, as well as the code that runs their website, under control.
-On comparing multiple frameworks we came to a conclusion to select Jekyll as our website framework. Jekyll creates less number of components and hence requires less maintenance which makes Jekyll easier to use and faster than other frameworks. Also, it only required a web server that can serve files hence returning the required files by the client without any extra processing.
Model Used(Supervised/Unsupervised):
1. Tensorflow:
So First, we thought of implementing a recommendation system through javascript. As a result, we must employ machine learning in JavaScript. As a solution, we chose the Tenserflow.js library, which was written in C++ and allows us to create, train, and predict machine learning models. So With the use of this, it is possible to combine web development with machine learning. It is a good option for javascript developers to use javascript to create multiple machine learning models without learning Python.
It is also compatible with node.js, which is a server-side framework. So We can say that it can work on both server-side and client-side. Users can either develop a new machine learning model from scratch or use and retrain existing models. There are some restrictions, such as data limits, and mainly it is designed for the browser. We may also use it in Node.js, although this requires a powerful computer. As a result, it is applied primarily to small models. We chose not to build a recommended system based on these disadvantages.
2. Pinecone:
Following that, we looked into Pinecone, which can be used to create a high-performance vector search application. We can build a recommendation system by employing the pinecone similarity search feature. To do so, we must first generate a pinecone API key. The next step is to establish an Index, which is Pinecone’s highest-level organizing unit for vector data. At the time of creating an index, we can also mention the number of pods. Pods are hardware units that have been pre-configured to execute a Pinecone service. The value of pods is set to 1 by default, but we can modify it to suit our needs. Following that, we must apply vector embedding to our dataset when uploaded, which is the method for reducing a large amount of text or any other object to a vector. It also converts numerical data to a vector for easier operations. We can now begin sending queries to the pinecone. Then, it retrieves the ids of the most similar vectors as well as their similarity scores. It’s most commonly utilized when creating a recommendation system with content-based filtering.
3. KNN:
KNN can be used to recommend items to users in content-based systems, collaborative filtering systems, and hybrid systems. We have used content-based filtering to recommend articles. In this, we have used article title, tags, development area, etc to create a profile for each user for articles. We can use the jacquard coefficient and cosine similarity to detect commonalities.
Here, to detect similarity we can use both similarities functions but we have high dimension data in the future so the euclidean distance is unhelpful and in that, all vectors are almost equidistant to the search query vector. So instead, we will use cosine similarity for the nearest neighbor search. There is also another popular approach to handling nearest neighbor search in high-dimensional data.
According to our analysis, we have finalized to use the KNN model with cosine similarity to recommend blogs for users. For the recommendation purpose in that, it measures the cosine angle between two vectors that are created from the preprocessed data in multi-dimensional space and gives output in the range of 0-1. (0 means no similarities and 1 means 100% similar). So from this, we can easily classify similar blog articles for selected articles and recommend them to the users.
Methodology:
Here we have data in the form of markdown files which contains all the blog details. So first we have stored markdown data into csv in which we have stored all required details likes blog title, author, specialization tags, github link, blog description, date, and blog read-time.
we have taken blog specialization tags and titles as selected features and further selected features are used in cosine similarity models. In the pre-processing, the type of the tags column is a string so first, we have converted this column to list a list type. Then we also removed some symbols from the title of each blog.
We have used TfidfVectorizer for feature extractions, and then we have fitted the selected feature into TfidfVectorizer and transformed the selected featured, and stored it into feature vectors further, We have passed these feature vectors into the cosine similarity model and found the similarity score for all the feature vectors, and now according to our selected blog we have to find the similar close match for recommending blog based on the similarity score.
Conclusion:
In machine learning for a web development project, we have first studied different types of web frameworks and multiple machine learning models which are used on the web and then find the most appropriate web framework which is Jekyll as a machine learning model we chose the cosine similarity model which takes blogs to train the model and as an output gives best four suitable blogs for the current blog and this output blogs are the most related blogs for the current blog and we hosted our python code in form of an API which takes the title of the blog as a parameter and returns four best-recommended blog titles and we have tried to integrate this API to the Jekyll web framework using liquid language and javascript.
Smit Dharaiya
An innovative thinker, initiative taker and multi dimensional professional with exceptional logical and analytical skills, pursuing a master's degree from IIIT Delhi in CSE department.
Vishva Nathvani
I have completed my graduation from Charusat university in the field of computer engineering and currently pursuing M.tech from IIITD. I am very enthusiast towards learning new technologies and solving competitive problems.
Vidhya Kothadia
Currently, I am pursuing my Mtech from IITD. I completed my bachelor's in Computer Engineering from CHARUSAT University, in 2021. I am passionate towards finding new approaches to a given problem and providing solutions to real life problems.
Comments