{"id":1706,"date":"2019-05-20T02:17:52","date_gmt":"2019-05-20T02:17:52","guid":{"rendered":"http:\/\/kusuaks7\/?p=1311"},"modified":"2023-09-12T13:17:30","modified_gmt":"2023-09-12T13:17:30","slug":"being-a-data-scientist-does-not-make-you-a-software-engineer","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/being-a-data-scientist-does-not-make-you-a-software-engineer\/","title":{"rendered":"Being a Data Scientist does not make you a Software Engineer!"},"content":{"rendered":"<section>\n<h3 id=\"c1dd\">Disclaimer<\/h3>\n<p id=\"f75b\">Hopefully, I caught your attention with the controversial title. Great! Now bear with me as I am going to show you how you can build a scalable architecture to surround your witty Data Science solution!<\/p>\n<p id=\"8ad4\">I am starting a\u00a0<strong>series of 2 articles<\/strong>\u00a0that will cover the basics of software engineering with regards to architecture and design and how to apply these on each step of the Machine Learning Pipeline:<\/p>\n<blockquote id=\"35fd\"><p><a href=\"https:\/\/towardsdatascience.com\/being-a-data-scientist-does-not-make-you-a-software-engineer-c64081526372?source=friends_link&amp;sk=fd1e5ace8c5bfdaa6e1b1ace201dbff1\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong><em>Part 1<\/em><\/strong><\/a><em>: Problem Statement | Architectural Styles | Design Patterns | SOLID<\/em><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/architecting-a-machine-learning-pipeline-a847f094d1c7?source=friends_link&amp;sk=f934e209896d28b1f3a11f081cb18cb3\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong><em>Part 2<\/em><\/strong><\/a><em>: Architecting a Machine Learning Pipeline<\/em><\/p><\/blockquote>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"9405\">Introduction<\/h3>\n<p id=\"32fa\"><a href=\"https:\/\/towardsdatascience.com\/not-yet-another-article-on-machine-learning-e67f8812ba86\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">As we have seen before<\/a>\u00a0in the famous Venn diagram of Steven Geringer, Data Science is the intersection of 3 disciplines: Computer Science, Mathematics\/Statistics and a particular Domain knowledge.<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/towardsdatascience.com\/not-yet-another-article-on-machine-learning-e67f8812ba86\" action=\"image-link\" only=\"true\" class=\"broken_link\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*MuglQTETZNJCRp1iceE7-Q.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*MuglQTETZNJCRp1iceE7-Q.png\" \/><\/a><\/p>\n<p style=\"text-align: center;\">Data Science Venn Diagram [Copyright Steven Geringer]<\/p>\n<p id=\"4bcc\">Having basic (or even advanced) programming skills is key to put your end to end experiment together, however it does not mean that you have created an application that is production ready. Unless you have come into Data Science and Machine Learning (ML) from an\u00a0<strong>IT background<\/strong>\u00a0and have tangible experience into building enterprise, distributed, solid systems, your Jupyter notebook does not qualify as a great piece of software and sadly does not make you a Software Engineer!<\/p>\n<p id=\"f378\">What you have built is a great\u00a0<strong>prototype<\/strong>\u00a0of a predictive product, but you still have to push it through the engineering roadmap. What you need is a team of professional Software Engineers by your side to take your (disposable) proof of concept and turn it into a\u00a0<strong>performant<\/strong>,\u00a0<strong>reliable<\/strong>,\u00a0<strong>loosely coupled<\/strong>\u00a0and\u00a0<strong>scalable<\/strong>\u00a0system!<\/p>\n<blockquote id=\"33c6\"><p>Everything is designed; few things are designed\u00a0well!<\/p><\/blockquote>\n<p id=\"3a3c\">In this series we will see some ideas of how this can be achieved\u2026 We will start with the basics in Part 1, and gradually design the holistic architecture in Part 2. The suggested architecture will be\u00a0<strong>technology agnostic.\u00a0<\/strong>The ML pipeline will be broken down into layers with clear demarcation of responsibilities, and at each layer, we can choose from a number of technology stacks.<\/p>\n<p id=\"a316\">But let\u2019s start by defining how a successful solution looks like!<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"3cc7\">Problem Statement<\/h3>\n<p id=\"7730\">The main objectives are to build a system that:<\/p>\n<blockquote id=\"42a7\"><p><em>\u25b8 Reduces\u00a0<\/em><strong><em>latency<\/em><\/strong><em>;<br \/>\n\u25b8 Is integrated but\u00a0<\/em><strong><em>loosely coupled<\/em><\/strong><em>\u00a0with the other parts of the system, e.g. data stores, reporting, graphical user interface;<br \/>\n\u25b8 Can\u00a0<\/em><strong><em>scale<\/em><\/strong><em>\u00a0both horizontally and vertically;<br \/>\n\u25b8 Is\u00a0<\/em><strong><em>message driven<\/em><\/strong><em>\u00a0i.e. the system communicates via asynchronous, non-blocking message passing;<br \/>\n\u25b8 Provides efficient computation with regards to\u00a0<\/em><strong><em>workload management<\/em><\/strong><em>;<br \/>\n\u25b8 Is\u00a0<\/em><strong><em>fault-tolerant<\/em><\/strong><em>\u00a0and self healing i.e. breakdown management;<br \/>\n\u25b8 Supports\u00a0<\/em><strong><em>batch<\/em><\/strong><em>\u00a0and\u00a0<\/em><strong><em>real-time<\/em><\/strong><em>\u00a0processing.<\/em><\/p><\/blockquote>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"662e\">Architectural Styles<\/h3>\n<p id=\"026f\">We will first introduce what a reactive system is and will proceed to a quick tour of the most prevalent architectural patterns.<\/p>\n<h4 id=\"b3d2\">Reactive Systems<\/h4>\n<p id=\"6755\">The reactive systems design paradigm is a coherent approach to building better systems, which are designed according to the tenets of the\u00a0<a href=\"https:\/\/www.reactivemanifesto.org\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.reactivemanifesto.org\">Reactive Manifesto<\/a>. Each reactive principle maps to an important system dimension of scalability:<br \/>\n\u2022\u00a0<em>Responsive<\/em>\u00a0\u2192 Time<br \/>\n\u2022\u00a0<em>Elastic<\/em>\u00a0\u2192 Load<br \/>\n\u2022\u00a0<em>Resilient<\/em>\u00a0\u2192 Error<br \/>\n\u2022\u00a0<em>Message Driven<\/em>\u00a0\u2192 Communication.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*GtONM09zvA4GqznK9oSv3A.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*GtONM09zvA4GqznK9oSv3A.png\" \/><\/p>\n<p style=\"text-align: center;\">Features of Reactive\u00a0Systems<\/p>\n<\/section>\n<section>\n<hr \/>\n<h4 id=\"8393\">Service Oriented Architecture (SOA)<\/h4>\n<p id=\"10c8\">SOA centres around the concept of decomposing business problems into services. The services share information via the network and they also share code (i.e. common components) to maintain consistency and reduce development effort.<br \/>\nThe service\u00a0<strong>provider<\/strong>\u00a0publishes a contract that specifies the nature of the service and how to use it. The service\u00a0<strong>consumer<\/strong>\u00a0can locate the service metadata in the registry and develop the required client components to bind to it and use it.<\/p>\n<p id=\"5131\">An\u00a0<strong>orchestrator<\/strong>\u00a0is a composite service which is responsible for invoking and combining other services. Alternatively,\u00a0<strong>choreography<\/strong>\u00a0employs a decentralised approach for service composition, i.e. services interact with the exchange of messages\/events.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*IP1EQNjUFUSo0qWS0LSNzQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*IP1EQNjUFUSo0qWS0LSNzQ.png\" \/><\/p>\n<p style=\"text-align: center;\">SOA<\/p>\n<\/section>\n<section>\n<hr \/>\n<h4 id=\"ca6c\">Streaming Architecture<\/h4>\n<p id=\"27eb\">A streaming architecture comprises of the following components:<\/p>\n<ul>\n<li id=\"87c7\"><strong>Producers<\/strong>: Applications that generate and send messages<\/li>\n<li id=\"d2eb\"><strong>Consumers<\/strong>: Applications that subscribe to and consume messages<\/li>\n<li id=\"9c1d\"><strong>Topics<\/strong>: Streams of records belonging to a particular category and stored as a sequence of ordered and immutable records partitioned and replicated across a distributed cluster<\/li>\n<li id=\"0343\"><strong>Stream Processors<\/strong>: Applications that process messages in a certain manner (e.g. data transformations, ML models, etc).<\/li>\n<\/ul>\n<figure id=\"7316\"><canvas width=\"75\" height=\"27\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 266px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*ncJK2FS-o19kNc9hhTWRZA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*ncJK2FS-o19kNc9hhTWRZA.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Streaming Architecture<\/p>\n<hr \/>\n<h4 id=\"49e4\">Lambda Architecture<\/h4>\n<p id=\"114c\">The Lambda (\u03bb) Architecture is designed to handle both\u00a0<strong>real-time<\/strong>\u00a0and historically aggregated\u00a0<strong>batched data<\/strong>\u00a0in an integrated fashion. It separates the duties of real-time and batch processing while query layers present a unified view of all of the data.<br \/>\nThe concept is simple: When data is generated, it is processed before stored, so analysis can include data generated in the last second, the last minute, or the last hour by only processing the incoming data\u200a\u2014\u200anot all the data.<\/p>\n<figure id=\"1169\" data-scroll=\"native\">&nbsp;<\/p>\n<p><img decoding=\"async\" style=\"width: 700px; height: 232px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*VcqxQTlGsF-FJXSBmAPD4Q.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*VcqxQTlGsF-FJXSBmAPD4Q.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Lambda Architecture<\/p>\n<hr \/>\n<h4 id=\"eb6c\">Microservice Architecture<\/h4>\n<p id=\"3be6\">Microservices, is an architectural style that structures an application as a collection of small, autonomous, loosely coupled and collaborating services, modelled around a business domain. The services communicate using either synchronous protocols such as HTTP\/REST or asynchronous protocols such as AMQP. They can be developed and\u00a0<strong>deployed independently<\/strong>\u00a0of one another. Each service has its own database in order to be decoupled from other services.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" style=\"width: 546px; height: 466px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*VebcfjRrUjmJaNg7J_lwdg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*VebcfjRrUjmJaNg7J_lwdg.png\" \/><\/p>\n<p style=\"text-align: center;\">Microservices Architecture<\/p>\n<\/section>\n<section>\n<hr \/>\n<h4 id=\"db32\">Representational State Transfer (REST) Architecture<\/h4>\n<p id=\"753f\">REST is an architectural style for developing\u00a0<strong>web services<\/strong>\u00a0and it builds upon existing features of the internet\u2019s HTTP. It allows transferring, accessing and manipulating textual data representations, in a stateless manner i.e. applications can communicate agnostically.<\/p>\n<p id=\"68b6\">A RESTful API service is exposed through a Uniform Resource Locator (URL), which provides the capability of data being created, requested, updated, or deleted (CRUD). It is best used to manage systems by decoupling the information that is produced and consumed from the technologies that produce and consume it!<\/p>\n<figure id=\"b4b5\"><canvas width=\"75\" height=\"35\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 339px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*BAx8U7noj4YGcoOTpiXcbA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*BAx8U7noj4YGcoOTpiXcbA.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">REST Architecture<\/p>\n<hr \/>\n<h3 id=\"58ed\">Design Patterns<\/h3>\n<p id=\"5b86\">We will only scratch the surface on this topic and will only discuss those patterns that I may be referring to in the 2nd Part of the series.\u200a\u2014\u200a[Hard to know just yet, but these are the patterns I use on a daily basis]<\/p>\n<blockquote id=\"0806\"><p>A software design pattern is an optimised, repeatable solution to a commonly occurring problem in software engineering. It is a template for solving a problem that can be used in many different situations.<\/p><\/blockquote>\n<\/section>\n<section>\n<hr \/>\n<h4 id=\"6253\">Strategy<\/h4>\n<p id=\"427d\">The Strategy pattern defines a family of algorithms, put each one in a separate class and make them\u00a0<strong>interchangeable<\/strong>. Encapsulating the behaviour in separate classes, eliminates any conditional statements and the correct algorithm (i.e. strategy) is chosen at run-time.<\/p>\n<p id=\"944b\"><strong>\u2014 Indication for usage<\/strong>: There are different implementations of a business rule or different variants of an algorithm are needed.<\/p>\n<figure id=\"5cc4\"><canvas width=\"75\" height=\"52\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 498px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*JU5IDcyRleFNzPIvv-P-iA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*JU5IDcyRleFNzPIvv-P-iA.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Strategy Pattern<\/p>\n<hr \/>\n<h4 id=\"630d\">Template Method<\/h4>\n<p id=\"dae8\">The Template Method intends to abstract out a common process from different procedures. It defines the\u00a0<strong>skeleton\u00a0<\/strong>of an algorithm, deferring some steps to sub-classes. The sub-classes can override some behaviour but cannot change the skeleton.<\/p>\n<p id=\"749d\"><strong>\u2014 Indication for usage<\/strong>: There is a consistent set of steps to follow but individual steps may have different implementations.<br \/>\n\u2b50\ufe0f\u00a0<strong>Difference to Strategy Pattern<\/strong>:<br \/>\n\u2022 Template: Algorithm is selected at\u00a0<strong>compile-time<\/strong>\u00a0by\u00a0<strong>sub-classing<\/strong>.<br \/>\n\u2022 Strategy: Algorithm is selected at\u00a0<strong>run-time<\/strong>\u00a0by\u00a0<strong>containment<\/strong>.<\/p>\n<figure id=\"3229\"><canvas width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 387px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*m001ZMHTFrqF5bL0yBe3Aw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*m001ZMHTFrqF5bL0yBe3Aw.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Template Method<\/p>\n<hr \/>\n<h4 id=\"0d06\">Chain of responsibility<\/h4>\n<p id=\"7511\">The Chain of Responsibility pattern suggests avoiding coupling the client (sender of requests) with the receiver, by enabling one or more\u00a0<strong>handlers\u00a0<\/strong>to cater for the requests. These handlers are linked into a chain i.e. each handler has a reference to the next handler in the chain.<\/p>\n<figure id=\"d5b4\"><canvas width=\"75\" height=\"10\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 104px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*fuJn5bGmMOGDETdJ7l06lA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*fuJn5bGmMOGDETdJ7l06lA.png\" \/><\/figure>\n<p id=\"a32a\"><strong>\u2014 Indication for usage<\/strong>: More than one objects may handle a request, and the handler (nor the sequence) isn\u2019t known a priori.<\/p>\n<figure id=\"b0f6\"><canvas width=\"75\" height=\"45\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 439px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*WFU13YADVnaruOKDv-SOMA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*WFU13YADVnaruOKDv-SOMA.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Chain of Responsibility<\/p>\n<hr \/>\n<h4 id=\"05d7\">Observer<\/h4>\n<p id=\"9c2c\">The Observer pattern (aka Publish\/Subscribe or PubSub for short) enables easy\u00a0<strong>broadcast\u00a0<\/strong>of communication by defining a one-to-many dependency between objects, so that when one object undergoes a change in state, all its dependents are notified and updated automatically. It is the observers responsibility to register the event they are \u2018observing\u2019.<\/p>\n<p id=\"2a3d\"><strong>\u2014 Indication for usage<\/strong>: When a change to one object requires changing others, and you don\u2019t know how many objects need to be changed.<\/p>\n<figure id=\"d98e\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 479px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*ELA-CNXgpG1oKEebpNQT4w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*ELA-CNXgpG1oKEebpNQT4w.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Observer Pattern<\/p>\n<hr \/>\n<h4 id=\"44a0\">Builder<\/h4>\n<p id=\"3804\">The Builder pattern is intended to construct a complex object in a\u00a0<strong>step-by-step<\/strong>\u00a0fashion and also separate the construction from its representation. In essence, it allows to produce different types and representations of an object using the same code.<\/p>\n<p id=\"40dc\"><strong>\u2014 Indication for usage<\/strong>: Several kinds of complex objects can be built with the same overall build process, albeit the variation in the individual construction steps.<\/p>\n<figure id=\"7020\"><canvas width=\"75\" height=\"45\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 438px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*IIA0KIGSc9FI4moR_11fWQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*IIA0KIGSc9FI4moR_11fWQ.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Builder Pattern<\/p>\n<hr \/>\n<h4 id=\"14a2\">Factory Method<\/h4>\n<p id=\"03f8\">The Factory Method defines an interface for\u00a0<strong>creating objects<\/strong>, but the instantiation is done by sub-classes.<\/p>\n<p id=\"7ef6\"><strong>\u2014 Indication for usage<\/strong>: The exact types and dependencies of the objects are not known beforehand.<\/p>\n<figure id=\"f927\"><canvas width=\"75\" height=\"60\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 564px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*MW6c9owh-1cyx58NqDF7FA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*MW6c9owh-1cyx58NqDF7FA.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Factory Method<\/p>\n<hr \/>\n<h4 id=\"ddf2\">Abstract Factory<\/h4>\n<p id=\"1077\">The Abstract Factory captures how to create\u00a0<strong>families of related products<\/strong>without specifying their concrete classes.<\/p>\n<p id=\"7274\"><strong>\u2014 Indication for usage<\/strong>: Different cases exist that require different implementations of sets of rules, that either unknown beforehand or extensibility is a concern.<br \/>\n\u2b50\ufe0f\u00a0<strong>Difference to Abstract Method<\/strong>:<br \/>\n\u2022 Abstract Factory: Creates other factories, and these factories in turn create objects derived from base classes.<br \/>\n\u2022 Abstract Method: Creates objects that derive from a particular base class.<\/p>\n<figure id=\"ac8a\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 473px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*QsvIVAEeIij4IQqqK5E_6w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*QsvIVAEeIij4IQqqK5E_6w.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Abstract Factory<\/p>\n<hr \/>\n<h4 id=\"af03\">Decorator<\/h4>\n<p id=\"4431\">The Decorator pattern attaches new responsibilities to an object dynamically, by placing it inside a special wrapper class that contains these behaviours, so there is no impact to the signature of the original methods (composition over inheritance).<\/p>\n<p id=\"fd29\"><strong>\u2014 Indication for usage<\/strong>: Assigning extra behaviours to objects at run-time without breaking the code that uses these objects.<\/p>\n<figure id=\"7e24\"><canvas width=\"75\" height=\"52\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 500px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*adpb5OfoOIxG1eQMKrTM0w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*adpb5OfoOIxG1eQMKrTM0w.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Decorator Pattern<\/p>\n<hr \/>\n<h4 id=\"b6a5\">Repository<\/h4>\n<p id=\"b4aa\">The Repository pattern addresses code centralisation for data retrieval and persistence and provides an\u00a0<strong>abstraction for data access<\/strong>\u00a0operations i.e. acts like an in-memory collection of domain objects to allow for CRUD methods to be performed, and removes any database concerns.<\/p>\n<p id=\"be4c\"><strong>\u2014 Indication for usage<\/strong>: Decoupling the business logic with data access code.<\/p>\n<figure id=\"8f31\"><canvas width=\"75\" height=\"57\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 538px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*X_12PxJLBSGdaBIeC_aruw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*X_12PxJLBSGdaBIeC_aruw.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Repository Pattern<\/p>\n<hr \/>\n<h4 id=\"24b7\">Little bonus<\/h4>\n<p id=\"d1d6\">Want to learn more about patterns? Start with the de-facto book of the \u2018Gang of Four\u2019, namely: \u2018<a href=\"https:\/\/www.amazon.co.uk\/dp\/0201633612\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.amazon.co.uk\/dp\/0201633612\" data->Design patterns: elements of reusable object-oriented software<\/a>\u2019. The following diagram with the patterns\u2019 relationships is noteworthy\u200a\u2014\u200a<em>pretty spiffy, eh<\/em>?<\/p>\n<figure id=\"d836\"><canvas width=\"66\" height=\"75\"><\/canvas><img decoding=\"async\" style=\"width: 639px; height: 728px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*pSjrA-Yu_uy6U2Jv8zfZ6w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*pSjrA-Yu_uy6U2Jv8zfZ6w.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">Courtesy:\u00a0<a href=\"https:\/\/www.amazon.co.uk\/dp\/0201633612\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.amazon.co.uk\/dp\/0201633612\" data->Design Patterns: Elements of Reusable Object-Oriented Software<\/a><\/p>\n<hr \/>\n<h3 id=\"f930\">SOLID<\/h3>\n<p id=\"28fe\">We will only toy with the SOLID principles here, as they are essential for every software developer to know.<\/p>\n<p id=\"68df\">As\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Robert_C._Martin\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Robert_C._Martin\" data->Uncle Bob<\/a>\u00a0says:\u00a0<em>\u201c<\/em><a href=\"https:\/\/sites.google.com\/site\/unclebobconsultingllc\/getting-a-solid-start\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/sites.google.com\/site\/unclebobconsultingllc\/getting-a-solid-start\" data-><em>They are not laws. They are not perfect truths. The are statements on the order of: An apple a day keeps the doctor away<\/em><\/a><em>\u201d.<\/em><\/p>\n<p id=\"0a80\">What this means is that they are not some kind of \u2018magic\u2019 that lead to the Promised Land of milk, honey and great software, but nevertheless they are crucial contributors to robust and long lasting software.<\/p>\n<p id=\"9a83\">In a nutshell, these principles revolve around two major concepts, which are the building blocks for successful enterprise applications:\u00a0<strong>coupling<\/strong>\u00a0is the degree to which one class knows about and interacts with another class and\u00a0<strong>cohesion<\/strong>\u00a0indicates the degree to which a class has a single purpose. In other words:<\/p>\n<blockquote id=\"27b4\"><p>\u2022 Coupling is all about how classes interact with each other, and<br \/>\n\u2022 Cohesion focuses on how a single class is designed.<\/p><\/blockquote>\n<\/section>\n<section>\n<hr \/>\n<h4 id=\"2203\">Single Responsibility Principle<\/h4>\n<blockquote id=\"ac76\"><p><strong><em>A class should have one, and only one, reason to change.<\/em><\/strong><\/p><\/blockquote>\n<p id=\"fee6\">This is self explanatory, but easier said than done\u200a\u2014\u200ait is always tempting to add new behaviours into existing classes, but that\u2019s a recipe for disaster: each behaviour could be a reason to change in the future, so less behaviours result in less opportunities to introduce bugs during changes.<\/p>\n<h4 id=\"bf02\">Open-Closed Principle<\/h4>\n<blockquote id=\"fecd\"><p><strong><em>You should be able to extend a class\u2019 behaviour, without modifying it.<\/em><\/strong><\/p><\/blockquote>\n<p id=\"edd9\">The classes you use should be open for extension but closed for modification. One way to achieve this is via inheritance i.e. create a sub-class so the original class is closed for modification, but custom code is added to the sub-class to introduce a new behaviour.<\/p>\n<h4 id=\"eac9\">Liskov Substitution Principle<\/h4>\n<blockquote id=\"a8b0\"><p><strong><em>Derived classes must be substitutable for their base classes.<\/em><\/strong><\/p><\/blockquote>\n<p id=\"b547\">When extending the behaviour of a class A into a sub-class B you must ensure that you can still exchange A with B without breaking anything. This can be a bit catchy especially when combining this principle with the Open-Closed one.<\/p>\n<h4 id=\"8588\">Interface Segregation Principle<\/h4>\n<blockquote id=\"1f57\"><p><strong><em>Make fine grained interfaces that are client specific.<\/em><\/strong><\/p><\/blockquote>\n<p id=\"7152\">Interfaces and classes must be as specialised as possible, so calling clients do not depend on methods they don\u2019t use. This goes hand in hand with the Single Responsibility principle.<\/p>\n<h4 id=\"23a8\">Dependency Inversion Principle<\/h4>\n<blockquote id=\"4200\"><p><strong><em>Depend on abstractions, not on concretions.<\/em><\/strong><\/p><\/blockquote>\n<p id=\"9240\">High level classes should not depend on low level ones. They should both depend on abstractions. Likewise, abstractions should not depend on details. Details should depend on abstractions.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h4 id=\"9a35\">Little Bonus<\/h4>\n<p id=\"9abd\">I have created this quick reference diagram. If you wonder where my inspiration for the little symbols on the left has come from, please take a look at: \u2018\u2018<a href=\"https:\/\/blogs.msdn.microsoft.com\/cdndevs\/2009\/07\/15\/the-solid-principles-explained-with-motivational-posters\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/blogs.msdn.microsoft.com\/cdndevs\/2009\/07\/15\/the-solid-principles-explained-with-motivational-posters\/\" data->The SOLID Principles, Explained with Motivational Posters<\/a>\u2019\u2019 article\u200a\u2014\u200aI love how the author has added a fun twist on the principles \ud83d\udc24.<\/p>\n<figure id=\"7a96\"><canvas width=\"75\" height=\"75\"><\/canvas><img decoding=\"async\" style=\"width: 640px; height: 640px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*bUTXREaWLOhcoBAU6Vokrg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*bUTXREaWLOhcoBAU6Vokrg.png\" \/><\/figure>\n<\/section>\n<section>\n<p style=\"text-align: center;\">SOLID<\/p>\n<hr \/>\n<h3 id=\"edc8\">Footnote<\/h3>\n<p id=\"f826\">This is not an exhaustive list of all the software engineering concepts but it is the basis of what we are going to use in the next article. I hope it gives you a good flavour of the contributing factors to building scalable software. Making the application design\u00a0<strong>resilient to changes<\/strong>\u00a0is key to building a successful solution\u200a\u2014\u200awhen the design process is rushed there is a fine to pay at the end of the project when errors are uncovered.<\/p>\n<blockquote><p>Good design is obvious. Great design is transparent.<\/p><\/blockquote>\n<p>Originally published in <a href=\"https:\/\/towardsdatascience.com\/being-a-data-scientist-does-not-make-you-a-software-engineer-c64081526372\" class=\"broken_link\" rel=\"noopener\">Medium<\/a><\/p>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>Unless you have come into Data Science and Machine Learning (ML) from an&nbsp;IT background&nbsp;and have tangible experience into building enterprise, distributed, solid systems, your Jupyter notebook does not qualify as a great piece of software and sadly does not make you a Software Engineer! This blog shows you how you can build a scalable architecture to surround your witty Data Science solution! This will cover the basics of software engineering with regards to architecture and design and how to apply these on each step of the Machine Learning Pipeline.<\/p>\n","protected":false},"author":556,"featured_media":2788,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[3236],"class_list":["post-1706","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":3236,"user_id":556,"is_guest":0,"slug":"semi-koen","display_name":"Semi\u00a0Koen Semi\u00a0Koen","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Semi\u00a0Koen","first_name":"Semi\u00a0Koen","job_title":"","description":"Semi Koen&nbsp;is Director | Technical Architect, Investment Banking at Mizuho International."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1706","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/556"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1706"}],"version-history":[{"count":2,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1706\/revisions"}],"predecessor-version":[{"id":31206,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1706\/revisions\/31206"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/2788"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1706"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1706"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1706"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1706"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}