Skip to main content

Scikit learn design principles

In this post we are going to take a look at the design principles of the very popular library which is Scikit Learn.

If you are into machine learning and deep learning then you might be familiar with the scikit learn library. But those who are beginners, they might have a small hint of how things work around here but through this post will help you to get a general idea about this opensource library.


Following topics will be covered in this post:
1.What is Scikit learn
2.Some details about scikit learn
3.list and describe the design principles

1.What is scikit learn??
 Scikit learn is an open source library written in python which supports many machine learning algorithms like Classification,Regression,Rlustering and many other algorithms.It was designed to work in harmony with other libraries like NumPy and SciPy.

2.Some more details about scikit learn
The first public release of Scikit was on February 1,2010 and was designed extensively by developers at the French Institute for Research in Computer Science and Automation. Before the French developer started to work on the library it was initially started by David Cournapeau as a Google Summer of Code project in 2007.

Mostly the algorithms in the library are written in Python but some of the algorithms are written in Cython to improve the performance.

3.General Design principles[1]
Some principles were followed while designing the interfaces in order to avoid frequent updates and improve the code maintainability.

Following are the general principles for the SciKit learn API design:


Consistency: The design of all the objects are consistent and they are documents in the same consistent way.

Inspection: The parameters to the constructor and the methods are exposed as  public attributes.

Non-proliferation of classes: Some rules are already in place which involves representing learning algorithms objects using custom classes, data set using NumPy and SciPy sparse matrix.The hyperparameters are expressed as the standard python strings and number.

Composition:Many machine learning tasks are expressible as sequences or combination of transformations to data. Mostly these algorithms are expresses as a composition of these basic building tasks.

Sensible defaults: Whenever the function requires parameters to be passed from the user then there will be sensible defaults set so that some basic flow is defined and we get some sensible output.

For more details refer the link : Scikit Learn Design Principles

3. General API[1]
Whenever you use the Scikit learn libraries objects they have certain interfaces whose implementation is common for all the objects.

Those interfaces are:
A. Predictor
B. Estimator
C. Transformer

A.Predictor
This interface is used for making predictions on the given data set. This interface is extending the functionality of the estimator interface by providing implementation for the predict method.
Using the predict method we make predictions using the trained model.Also there are other methods which give the score for the confidence of the predictions.

B.Estimator
This API is at the core of the the Scikit learn where is exposes the methods which implicitly contains procedures for defining the objects for the algorithm. Estimator's constructor does not really see the data in fact it just accepts some public hyperparameter.The estimator exposes fit methods to accept the public parameters and fir the model.

C.Transformer
Transformer is just an extension to the estimators API and it exposes a predict method. The predict method accepts an array as a parameter and in return predicts the labels and values. The value returned is based on the parameters set using the estimator.

For more details refer the link : Scikit Learn Design Principles

Thank you.

That's all for this post!!
Thank you for reading this post.
If you have any suggestions regarding the post contents or if you need some more details on any other topic, please post it in the comments section.
Your suggestions are too valuable so they should not be missed.


References
1.https://arxiv.org/pdf/1309.0238.pdf
2.https://en.wikipedia.org/wiki/Scikit-learn

Comments

Popular posts from this blog

PyMuPDF vs PDFMiner

 As a developer , I was tasked to extract specific data from a PDF. Upon analysing it further, certain patterns were found based on keywords in the document. Since I was using Python language for the task I found 2 tools quite useful which are PyMuPDF and PDFMiner. These tools can then be used to extract the text from a page on which regular expression can be applied to further extract relevant data.     Next, we are going to take a deeper look into these tools, specifically focusing on the pros and cons of each.     PyMuPDF   Docs , PIP package Pros Simple and understandable API Extensive tools to work with text, images, and graphics Available as a PIP package (pip install PyMuPDF) Better support for a range of symbols comparer to PyPDF2   Cons Parsed text is not in sequence Dependency on other package-Fitz Text sequence information lost during extraction     PDFMiner   Docs ,  PIP package ...

Finding difference between 2 files in Python

In this post, we will take a look at how to compare two files using Python.   I was tasked to compare 2 files and then list the differences between them using Python. Initially, I started with filecmp module, but even with the function parameter ‘ shallow’ set to false, the Boolean result was not enough. Sure, it can act as an indicator to take some action, but it will not list the differences.   I was looking for something more visual, something like color coding and not like the git diff output, which is not very user-friendly. But, another Python internal module, difflib helped me to get the job done.   Inside Difflib, HtmlDiff is what I was looking for. The differences were highlighted with 3 different colors and also the line numbers were indicated in a table to locate the differences. The results are quite self-explanatory and it is easier to explain the differences to other people. Code for generating the above difference table: Note: File1...

Adding existing Anaconda environment to Jupyter notebook

In this post we are going to take a look at adding Anaconda environment to Jupyter notebook. Recently, I was working on a CSV file and wanted to work with Pandas package for tabular data manipulation using Python. The problem was even if I install Pandas package, I would have to install other Data Science package as needed. But, the Anaconda environment was already setup on my laptop, which I want to reuse.   Today, we will look into how to reuse the Anaconda environment within the Jupyter Notebook.   There are 4 basic steps to be followed for adding the environment: 1. Create a conda environment Go to Conda command prompt(Run in Admin mode) Run the following command: conda create –-name newenv O/P:   What if there is an existing conda environment? Go to Conda command prompt(No need for Admin mode) Run the following command: conda env list O/P: Since there was only one environment, only one entry was displayed. ‘*’ indicates the cur...