<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title type="text">Renien John Joseph</title>
<generator uri="https://github.com/mojombo/jekyll">Jekyll</generator>
<link rel="self" type="application/atom+xml" href="/feed.xml" />
<link rel="alternate" type="text/html" href="/" />
<updated>2025-12-31T16:15:39+00:00</updated>
<id>/</id>
<author>
  <name>Renien Joseph</name>
  <uri>/</uri>
  <email>renien.john@email.com</email>
</author>


<entry>
  <title type="html"><![CDATA[From Data Platforms to Agentic AI: Building AI-Native Platforms at Scale 🚀]]></title>
  <link rel="alternate" type="text/html" href="/articles/from-data-platform-to-agentic-ai/"/>
  <id>/articles/from-data-platform-to-agentic-ai</id>
  <published>2025-12-31T19:39:55+00:00</published>
  <updated>2025-12-31T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#DataPlatform" term="DataPlatform" /><category scheme="/tags/#AI" term="AI" /><category scheme="/tags/#AgenticAI" term="AgenticAI" /><category scheme="/tags/#Architecture" term="Architecture" /><category scheme="/tags/#MLOps" term="MLOps" /><category scheme="/tags/#Observability" term="Observability" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;transform: rotate(45deg);&quot;&gt;
  &lt;a href=&quot;https://github.com/Renien&quot; target=&quot;_blank&quot; style=&quot;font: 700 13px Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-align: center;width: 200px;line-height: 20px;display: inline-block;&quot;&gt;
     Fork me on GitHub
  &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;looking-back-building-data-foundations&quot;&gt;Looking Back: Building Data Foundations&lt;/h2&gt;

&lt;p&gt;My work in &lt;strong&gt;Data and AI&lt;/strong&gt; began with a deceptively simple but hard question:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;How do you build data platforms that scale with the business?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In 2023, most of my focus was on:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Designing and operating &lt;strong&gt;large-scale data platforms&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Improving &lt;strong&gt;data reliability, freshness, and trust&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Enabling &lt;strong&gt;analytics, experimentation, and machine learning&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Reducing friction between &lt;strong&gt;engineering, product, and data teams&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that stage, success wasn’t measured by how advanced the models were, but by whether &lt;strong&gt;data systems were dependable enough to support real decisions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This period reinforced a belief I still hold strongly:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Strong AI systems are built on strong data platforms.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;elt-based-big-data-stacks-as-a-foundation&quot;&gt;ELT-Based Big Data Stacks as a Foundation&lt;/h2&gt;

&lt;p&gt;One of the most impactful architectural decisions was adopting &lt;strong&gt;ELT-based big data stacks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;ELT enabled:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Scalable ingestion of &lt;strong&gt;diverse data sources&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Clear separation of &lt;strong&gt;data movement&lt;/strong&gt; from &lt;strong&gt;data modeling&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Faster iteration on transformations and schemas&lt;/li&gt;
  &lt;li&gt;Better support for &lt;strong&gt;analytics and ML workloads&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach allowed teams to focus on &lt;strong&gt;using data&lt;/strong&gt;, not fighting pipelines.&lt;br /&gt;
More importantly, it created a &lt;strong&gt;flexible foundation&lt;/strong&gt; capable of supporting increasingly advanced AI use cases.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;high-level-platform-architecture&quot;&gt;High-Level Platform Architecture&lt;/h2&gt;

&lt;p&gt;The diagram below shows how &lt;strong&gt;ELT-based data platforms evolve into AI-native platforms&lt;/strong&gt; that support agents.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/data-ai-agent-platform.png&quot;&gt;
    &lt;img src=&quot;/articles/data-ai-agent-platform.png&quot; alt=&quot;image&quot; /&gt;
  &lt;/a&gt;
  &lt;figcaption&gt;AI-Native Platform at Scale (High Level)&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;from-predictive-models-to-agentic-systems&quot;&gt;From Predictive Models to Agentic Systems&lt;/h2&gt;

&lt;p&gt;Traditional ML systems fit neatly into &lt;strong&gt;batch processing&lt;/strong&gt; or &lt;strong&gt;request–response APIs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic AI changes the paradigm.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agentic systems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reason continuously over time&lt;/li&gt;
  &lt;li&gt;Plan and execute &lt;strong&gt;multi-step actions&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Use tools and data dynamically&lt;/li&gt;
  &lt;li&gt;Learn from outcomes and feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reframes AI from &lt;strong&gt;prediction services&lt;/strong&gt; into &lt;strong&gt;systems that think and act&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At this stage, &lt;strong&gt;platform design becomes more important than any individual model&lt;/strong&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;data-platforms-as-cognitive-infrastructure&quot;&gt;Data Platforms as Cognitive Infrastructure&lt;/h2&gt;

&lt;p&gt;Agentic AI raises expectations for what data platforms must provide.&lt;/p&gt;

&lt;p&gt;Beyond storage and pipelines, platforms now need to support:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Contextual access to &lt;strong&gt;historical and real-time data&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Clear &lt;strong&gt;data semantics and lineage&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Durable &lt;strong&gt;memory grounded in reliable storage&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Feedback loops tied to &lt;strong&gt;real outcomes&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Strong &lt;strong&gt;access controls and safety boundaries&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data platforms become:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Cognitive infrastructure that grounds autonomous intelligence in reality&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They are what make agent reasoning &lt;strong&gt;auditable, explainable, and trustworthy&lt;/strong&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;open-source-observability-making-ai-operable&quot;&gt;Open-Source Observability: Making AI Operable&lt;/h2&gt;

&lt;p&gt;As systems become more autonomous, &lt;strong&gt;observability becomes foundational&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A significant part of my work has involved deploying &lt;strong&gt;open-source observability stacks&lt;/strong&gt; spanning data, ML, and agents.&lt;/p&gt;

&lt;p&gt;These stacks monitor:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Data pipelines and freshness&lt;/li&gt;
  &lt;li&gt;Feature generation and serving&lt;/li&gt;
  &lt;li&gt;Model performance, drift, and bias&lt;/li&gt;
  &lt;li&gt;Agent reasoning, decisions, and tool usage&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;observability-across-data-ml-and-agents&quot;&gt;Observability Across Data, ML, and Agents&lt;/h2&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/ai-platform-observability.png&quot;&gt;
    &lt;img src=&quot;/articles/ai-platform-observability.png&quot; alt=&quot;image&quot; /&gt;
  &lt;/a&gt;
  &lt;figcaption&gt;Observability for Data + ML + Agents&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Open-source observability provides &lt;strong&gt;transparency, extensibility, adherence to regulatory rules, and deep platform integration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;More importantly, it turns AI from a &lt;strong&gt;black box&lt;/strong&gt; into an &lt;strong&gt;operable system.&lt;/strong&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;data-ml-and-agent-apis-as-platform-interfaces&quot;&gt;Data, ML, and Agent APIs as Platform Interfaces&lt;/h2&gt;

&lt;p&gt;At scale, intelligence must be exposed through &lt;strong&gt;clear, stable interfaces&lt;/strong&gt;.&lt;br /&gt;
Modern platforms achieve this through a combination of data, ML, and agent APIs that separate implementation from usage while enabling governance, control, and observability.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;platform-apis-data-ml-and-agent-interfaces&quot;&gt;Platform APIs: Data, ML, and Agent Interfaces&lt;/h2&gt;

&lt;p&gt;Modern intelligent platforms typically provide:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Data APIs&lt;/strong&gt; for features, aggregates, and real-time signals&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;ML APIs&lt;/strong&gt; for predictions, embeddings, and scoring&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Agent APIs&lt;/strong&gt; for tool access, memory, and controlled actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These APIs are where &lt;strong&gt;governance meets autonomy&lt;/strong&gt;—ensuring that intelligent systems remain auditable and controllable.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;architecture-overview&quot;&gt;Architecture Overview&lt;/h2&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/ai-platform-apis.png&quot;&gt;
    &lt;img src=&quot;/articles/ai-platform-apis.png&quot; alt=&quot;image&quot; /&gt;
  &lt;/a&gt;
  &lt;figcaption&gt;Platform APIs: Data, ML, and Agent Interfaces&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;APIs decouple intelligence from implementation details and make &lt;strong&gt;agent behavior controllable and auditable.&lt;/strong&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;elt-apis-and-observability-enabling-agentic-ai&quot;&gt;ELT, APIs, and Observability: Enabling Agentic AI&lt;/h2&gt;

&lt;p&gt;Agentic AI does not emerge from models alone—it is the result of &lt;strong&gt;platform design choices&lt;/strong&gt;.&lt;br /&gt;
A reliable foundation is created by combining three core capabilities:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;ELT-based data platforms&lt;/strong&gt; that support scalable, replayable, and auditable data flows&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Strong Data and ML APIs&lt;/strong&gt; that expose features, predictions, and embeddings through stable interfaces&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Open-source observability tools&lt;/strong&gt; that provide visibility into decisions, actions, and outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, these components transform AI from isolated predictions into &lt;strong&gt;operational intelligence&lt;/strong&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;what-this-enables&quot;&gt;What This Enables&lt;/h2&gt;

&lt;p&gt;A platform built on ELT, APIs, and observability allows teams to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Roll out intelligent systems incrementally&lt;/strong&gt;, starting with human-in-the-loop workflows&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Evaluate human and agent decisions side by side&lt;/strong&gt;, using the same data and metrics&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Continuously monitor, learn, and improve&lt;/strong&gt;, based on real-world feedback&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Build trust through visibility and control&lt;/strong&gt;, rather than blind automation&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;scaling-intelligence-responsibly&quot;&gt;Scaling Intelligence Responsibly&lt;/h2&gt;

&lt;p&gt;This approach ensures that intelligence scales in a way that is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Powerful&lt;/strong&gt;, by leveraging modern data and ML capabilities&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Transparent&lt;/strong&gt;, through observable behavior and decision traces&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Accountable&lt;/strong&gt;, with clear ownership, governance, and guardrails&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how agentic systems move from experimentation to production—&lt;br /&gt;
&lt;strong&gt;not as black boxes, but as trusted, auditable platform capabilities&lt;/strong&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;looking-toward-2026-building-ai-native-platforms&quot;&gt;Looking Toward 2026: Building AI-Native Platforms&lt;/h2&gt;

&lt;p&gt;Looking ahead, my focus is shifting toward &lt;strong&gt;AI-native platform design&lt;/strong&gt;—systems where intelligence is not an add-on, but a foundational capability.&lt;/p&gt;

&lt;p&gt;Rather than centering on individual models, AI-native platforms are built to support &lt;strong&gt;reasoning, action, and learning as continuous processes&lt;/strong&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;areas-of-focus&quot;&gt;Areas of Focus&lt;/h2&gt;

&lt;p&gt;I’m particularly interested in advancing platforms that emphasize:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Agent-Aware Data Architectures&lt;/strong&gt;&lt;br /&gt;
Data systems designed to support planning, memory, feedback loops, and long-running agent workflows.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Observability for Reasoning and Actions&lt;/strong&gt;&lt;br /&gt;
Visibility not just into outputs, but into &lt;em&gt;why&lt;/em&gt; decisions were made and &lt;em&gt;how&lt;/em&gt; actions unfolded over time.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Data, ML, and Agent APIs as First-Class Citizens&lt;/strong&gt;&lt;br /&gt;
Stable interfaces that make intelligence accessible, governable, and composable across the organization.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Human-in-the-Loop Autonomy&lt;/strong&gt;&lt;br /&gt;
Systems that blend automation with oversight, enabling agents to act independently while remaining accountable.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Intelligence as Infrastructure&lt;/strong&gt;&lt;br /&gt;
Platforms that treat reasoning, learning, and decision-making as shared capabilities—just like compute, storage, and networking.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;a-platform-first-future&quot;&gt;A Platform-First Future&lt;/h2&gt;

&lt;p&gt;The future of AI will be defined &lt;strong&gt;less by individual models&lt;/strong&gt; and &lt;strong&gt;more by the systems that surround them&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;AI-native platforms will be the differentiator—enabling organizations to scale intelligence responsibly, adapt continuously, and build trust in autonomous systems.&lt;/p&gt;

&lt;p&gt;This is the direction where durable, production-grade AI will be built.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;/articles/from-data-platform-to-agentic-ai/&quot;&gt;From Data Platforms to Agentic AI: Building AI-Native Platforms at Scale 🚀&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on December 31, 2025.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Refelction on 2023 🚀]]></title>
  <link rel="alternate" type="text/html" href="/articles/refelction-on-2023/"/>
  <id>/articles/refelction-on-2023</id>
  <published>2023-12-31T19:39:55+00:00</published>
  <updated>2023-12-31T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#DataPlatform" term="DataPlatform" /><category scheme="/tags/#AI" term="AI" /><category scheme="/tags/#Growth" term="Growth" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));-webkit-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);-moz-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;-webkit-transform: rotate(45deg);-moz-transform: rotate(45deg);-ms-transform: rotate(45deg);-o-transform: rotate(45deg);transform: rotate(45deg);&quot;&gt;&lt;a href=&quot;https://github.com/Renien&quot; style=&quot;font: 700 13px &amp;quot;Helvetica Neue&amp;quot;, Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-shadow: 0 -1px rgba(0, 0, 0, 0.5);text-align: center;width: 200px;line-height: 20px;display: inline-block;padding: 2px 0;border-width: 1px 0;border-style: dotted;border-color: rgba(255, 255, 255, 0.7);&quot; target=&quot;_blank&quot;&gt;Fork me on GitHub&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;The end of a year is the beginning of another new journey with exciting plans. It’s been a great year with lots of learning that always leads to reflection and plans for the coming year.&lt;/p&gt;

&lt;p&gt;Friends/subscribers of my blog know the whole  journey of our Data Platform-in-a-Box. Personally, it has changed my perspective on how I look at Data Platform/Data Team back in 2021: &lt;a href=&quot;http://renien.com/articles/data-platform-in-a-box/&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Read More&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;It used to be a BIG mammoth and now it’s tied up to a SMALL box 😀&lt;/p&gt;
&lt;/blockquote&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/mammoth-box.png&quot;&gt;&lt;img src=&quot;/articles/mammoth-box.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Transition journey&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Never thought building and operating a Data Platform will become sweet as pie. The key secret of success is stepping out of your comfort zone, which means learning new things, meeting new people, seeing new places, and trying new experiences.&lt;/p&gt;

&lt;p&gt;Most importantly keep consistent on the following,&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Keep your bars higher…&lt;/li&gt;
  &lt;li&gt;Incrementally move the bars to higher…&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Achieving all your aspirational goals is always not an easy thing. It’s a lifelong journey. The journey will be a bumpy ride, keep breaking all the obstacles/mountains to climb the next mountain. But always remember to celebrate the small victories.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Next year 2024, is going to be a transaction journey to Data &amp;amp; AI in a box.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/transition.png&quot;&gt;&lt;img src=&quot;/articles/transition.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Data &amp;amp; AI Transition / Data &amp;amp; AI in-a-Box&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;I will keep sharing the journey with Data &amp;amp; AI in-a-box. Keep following and follow my github profile to get notified on all the new projects.&lt;/p&gt;

&lt;p&gt;Stay Tuned!&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I can’t wait to crush 2024 together… cheers! 🍺&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/refelction-on-2023/&quot;&gt;Refelction on 2023 🚀&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on December 31, 2023.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[[Modern] On-Prem Data Platform]]></title>
  <link rel="alternate" type="text/html" href="/articles/modern-onprem-data-platform/"/>
  <id>/articles/modern-onprem-data-platform</id>
  <published>2023-08-27T19:39:55+00:00</published>
  <updated>2023-08-27T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#DataPlatform" term="DataPlatform" /><category scheme="/tags/#OnPrem" term="OnPrem" /><category scheme="/tags/#ELT" term="ELT" /><category scheme="/tags/#Trino" term="Trino" /><category scheme="/tags/#Presto" term="Presto" />
  <content type="html">
    &lt;p&gt;An on-premises data warehouse refers to a data storage and management solution that is physically hosted within an organization’s own data center or facilities, as opposed to being hosted in the cloud.&lt;/p&gt;

&lt;p&gt;In recent years, the trend has been shifting toward cloud-based data warehousing solutions due to their scalability, flexibility, and reduced upfront costs. However, on-premises data warehouses continue to be relevant for organizations with specific requirements around data control, security, compliance, and performance.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/data-privacy.png&quot;&gt;&lt;img src=&quot;/articles/data-privacy.png&quot; alt=&quot;image&quot; width=&quot;25%&quot; height=&quot;25%&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Data Privacy&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;In recent times I’ve been solutioning/designing on-prem data platforms due to data residency law. When it comes to on-prem data platforms, the first thing that a data geek will think about is managing Hadoop clusters. But the intention on a SaaS data platform is to reduce the operational overhead and managing Hadoop clusters is actually a big overhead.&lt;/p&gt;

&lt;p&gt;If it’s only for canned reports that can be done easily with RDBMS databases. But nowadays a data centric approach is a key thing for any business. If we want to really do data driven business it’s vital to have a proper data lake, data warehouse or data mesh strategy to enable it.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/hadoop-dp.png&quot;&gt;&lt;img src=&quot;/articles/hadoop-dp.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Hadoop Data Lake &amp;amp; Warehouse&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;The technology growth has changed a lot of pieces on big data and data platform spectrum. As I said, managing Hadoop clusters is an overkill to a company and team. So basically we need to find reliable data lake storage similar to cloud bucket storage.&lt;/p&gt;

&lt;p&gt;Minio &amp;amp; Chep / Rook both are very popular open source distributed storage that can be managed easily and it supports on top of the kubernetes cluster. This actually changed the whole thinking pattern of the Hadoop ecosystem. In our recent on-prem solution we wiped out the Hadoop ecosystem idea and managed to build a fully managed on-prem data platform on top of the kubernetes cluster.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/s3-bucket.png&quot;&gt;&lt;img src=&quot;/articles/s3-bucket.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;S3 like bucket&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Since managing and querying big data on Hadoop can be challenging. We found alternative solutions like Minio/CEPH S3 object stores which you can access and query with Presto or Trino. Therefore, we ended up relying on S3 object stores that’s deployed on K8 clusters and Hive metastore, Trino for data warehousing.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/On-Prem-DWH-Solution.png&quot;&gt;&lt;img src=&quot;/articles/On-Prem-DWH-Solution.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Modern On-Prem DWH Solution&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;So far the journey looks really great and moto on building &lt;a href=&quot;/articles/data-platform-in-a-box/&quot;&gt;&lt;strong&gt;Data Platform-in-a-Box&lt;/strong&gt;&lt;/a&gt; rolling out to be real and enjoying the performance and scalability on the k8 cluster.&lt;/p&gt;

&lt;p&gt;The current setup on on-prem data platform is somewhat equivalent to all cloud data platforms. One of the missing pieces in this blog is on masking policies and solutions. We have already solved it as well and in my next blog post I will explain about it.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Stay Tuned!… Cheers! 🍺&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/modern-onprem-data-platform/&quot;&gt;[Modern] On-Prem Data Platform&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on August 27, 2023.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Introducing Light-Weight (EL) Extractor Loader]]></title>
  <link rel="alternate" type="text/html" href="/articles/light-weight-et/"/>
  <id>/articles/light-weight-et</id>
  <published>2023-02-05T19:39:55+00:00</published>
  <updated>2023-02-05T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#DataPlatform" term="DataPlatform" /><category scheme="/tags/#Cloud%20Agnostic%20Platform" term="Cloud Agnostic Platform" /><category scheme="/tags/#ELT" term="ELT" /><category scheme="/tags/#MODERN%20DWH" term="MODERN DWH" />
  <content type="html">
    &lt;p&gt;Data engineering field is evolving and transforming in many ways, driven by the increasing amount of data being generated and the need to &lt;strong&gt;MAKE SENSE&lt;/strong&gt; of it. In data engineering there are two concepts ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) that are used to move between two systems. ETL is a traditional approach and ELT is a more modern approach with the influence on the cloud platform where the data is first extracted from the source system and loaded into the target system and then transformed.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/ELT-diagram.png&quot;&gt;&lt;img src=&quot;/articles/ELT-diagram.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Extract Load Transform&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;In the ELT approach, data extraction is always a challenging part for data engineers. Data extraction  refers to the process of retrieving specific information from a large or small dataset. It can be done manually or through the use of various tools and techniques.&lt;/p&gt;

&lt;p&gt;Sqoop is one of the important tools that is used across the industries. I was one of the biggest fans with Sqoop and it has helped me throughout many years to extract data from relational databases. Still we are relying on Sqoop in most of the legacy platforms.&lt;/p&gt;

&lt;p&gt;In this journey Sqoop was not the right tool for us and currently it’s been retired by the apache community as well.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/retired-sqoop.PNG&quot;&gt;&lt;img src=&quot;/articles/retired-sqoop.PNG&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Apache Sqoop - Retired&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;So I  had to look for a better tool to extract data from multiple sources and found a lot of proprietary tools where we need to pay a lot of money. But after a while staying in this field we always pop up with crazy ideas. One of them was to create a &lt;strong&gt;self service ELT tool&lt;/strong&gt; to support our data platform.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/self-service.jpg&quot;&gt;&lt;img src=&quot;/articles/self-service.jpg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Self Service Product&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;After a couple of months, we managed  to build a &lt;strong&gt;lightweight EXTRACTOR&amp;lt;&amp;gt;LOADER tool&lt;/strong&gt; that will support &lt;strong&gt;any type of database&lt;/strong&gt;. The outcome of the product was great and the performance of &lt;strong&gt;EXTRACTOR-LOADER&lt;/strong&gt; is really impressive.&lt;/p&gt;

&lt;p&gt;The concept we relied from the &lt;strong&gt;old school books theories and improvised a bit to create the lightweight library&lt;/strong&gt;. Actually you can build it using any programming language that you prefer and we mostly relied on python.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/producer-consumer-restaurant.webp&quot;&gt;&lt;img src=&quot;/articles/producer-consumer-restaurant.webp&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Producer Consumer Restaurant Example [Ref:https://levelup.gitconnected.com/producer-consumer-problem-using-mutex-in-c-764865c47483]&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;This idea got triggered from the very old theory of producer-consumer concept. Like shown in the above figure where the cook will keep preparing the food and place it on a shared resource plate and the kid keeps enjoying the food.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/producer-consumer-pattern.png&quot;&gt;&lt;img src=&quot;/articles/producer-consumer-pattern.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Producer Consumer Pattern For Data Ingestion Problem&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Similarly, to our EXTRACTOR LOADER  problem we can apply the producer-consumer approach. 
Extractor job will be the producer who extracts the data from source and consumer will be a loader job that keeps consuming (loading the data) the extracted  data  when it’s available in the shared resource.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/articles/light-weight.PNG&quot;&gt;&lt;img src=&quot;/articles/light-weight.PNG&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Light-Weight Framework&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Currently with one producer and consumer the LIGHT-WEIGHT library; it’s showing massive improvement in the performance. Our next step will be to improve the framework to implement a multi producer-consumer solution to increase the performance of &lt;strong&gt;EXTRACT LOADER&lt;/strong&gt; tasks. If we manage  to  crack the solution I will share the concept in my blog.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Stay Tuned!… Cheers! 🍺&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/light-weight-et/&quot;&gt;Introducing Light-Weight (EL) Extractor Loader&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on February 05, 2023.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Data Platform-in-a-Box]]></title>
  <link rel="alternate" type="text/html" href="/articles/data-platform-in-a-box/"/>
  <id>/articles/data-platform-in-a-box</id>
  <published>2022-12-14T19:39:55+00:00</published>
  <updated>2022-12-14T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#DataPlatform" term="DataPlatform" /><category scheme="/tags/#Cloud%20Agnostic%20Platform" term="Cloud Agnostic Platform" /><category scheme="/tags/#Foundation%20for%20Data%20Science" term="Foundation for Data Science" /><category scheme="/tags/#DevOps" term="DevOps" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));-webkit-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);-moz-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;-webkit-transform: rotate(45deg);-moz-transform: rotate(45deg);-ms-transform: rotate(45deg);-o-transform: rotate(45deg);transform: rotate(45deg);&quot;&gt;&lt;a href=&quot;https://github.com/Renien&quot; style=&quot;font: 700 13px &amp;quot;Helvetica Neue&amp;quot;, Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-shadow: 0 -1px rgba(0, 0, 0, 0.5);text-align: center;width: 200px;line-height: 20px;display: inline-block;padding: 2px 0;border-width: 1px 0;border-style: dotted;border-color: rgba(255, 255, 255, 0.7);&quot; target=&quot;_blank&quot;&gt;Fork me on GitHub&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;The software industry is gradually evolving and everyone likes to have quick delivery similar to fast food. In order to stay competitive, organizations are engaged in developing highly scalable products and services.&lt;/p&gt;

&lt;p&gt;I recently prepared a quick fast food and full meals with different dishes. Actually, a full meal with different dishes stays in our memory for a long time compared with quick fast food. Similarly, a well-designed product always helps the company to scale up easily by attracting a lot of customers and also it gives us the flexibility to customize based on their choices/needs. If you look at below pics, potato wedges vs banana leaf meals; who will not like a full meals with different flavors 😀.&lt;/p&gt;

&lt;figure class=&quot;half&quot; style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/potato-wedges.png&quot;&gt;&lt;img src=&quot;/articles/potato-wedges.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
	&lt;a href=&quot;/articles/full-meals.png&quot;&gt;&lt;img src=&quot;/articles/full-meals.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Fast Food vs Full Meals&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From the inception data has always been a critical component. Data platform is gradually becoming/evolving to support the mainstream system to give the power back to business. Being in a SaaS hyper growth company is a fun ride. There were multiple key challenges to build a Data Platform to serve the SaaS business. It’s like sailing in the rough sea 🙂.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/rough-sea.PNG&quot;&gt;&lt;img src=&quot;/articles/rough-sea.PNG&quot; alt=&quot;image&quot; style=&quot;
    width: 90%;
    height: 90%;&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Sailing in the rough sea&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;There were lot of questions to be answered to achieve this mission,&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;How to manage multiple flavors of data warehouse?&lt;/li&gt;
  &lt;li&gt;How to maintain multiple data platforms/multiple code bases?&lt;/li&gt;
  &lt;li&gt;How about the data accuracy?&lt;/li&gt;
  &lt;li&gt;How about managing PII data and data governance?&lt;/li&gt;
  &lt;li&gt;How do we maintain the systems? Do we need a big army?&lt;/li&gt;
  &lt;li&gt;How fast can we bring up a data platform?&lt;/li&gt;
  &lt;li&gt;How can we build a platform to empower data science to work?&lt;/li&gt;
&lt;/ol&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/how.jfif&quot;&gt;&lt;img src=&quot;/articles/how.jfif&quot; alt=&quot;image&quot; style=&quot;
    width: 50%;
    height: 50%;&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;The answer relies on two strategies.&lt;/p&gt;

&lt;h3 id=&quot;strategy-1-build-a-team-with-a-culture-of-engineering-excellence&quot;&gt;Strategy 1: Build a team with a culture of engineering excellence&lt;/h3&gt;

&lt;p&gt;Culture is not something you build overnight but it’s created by everyone and everything that happens on your team. It’s actually a journey with lots of ups and downs. But if you have a clear vision that can be explained to your team and show them by practicing in your day-to-day activities to reach the goals. After a while the people around you will understand the importance of it and gradually their actions influence others as well.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Motto: “Should be cathedral builders, not bricklayers”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;strategy-2-build-data-platform-in-a-box&quot;&gt;Strategy 2: Build data platform in a box&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;p&gt;Motto: “Data platform in one button click with 360 view on quality”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Data is scattered all around the platform and around us. The following are the key modern principles of data platform.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;View data as shared asset&lt;/li&gt;
  &lt;li&gt;Provide the right interfaces to the users to consume data&lt;/li&gt;
  &lt;li&gt;Ensure the security of data and access controls&lt;/li&gt;
  &lt;li&gt;Establish a common vocabulary&lt;/li&gt;
  &lt;li&gt;Curate the data&lt;/li&gt;
  &lt;li&gt;Eliminate data copies and movements&lt;/li&gt;
  &lt;li&gt;Data quality&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It’s essential to follow these above key modern principles to build a stable data platform.&lt;/p&gt;

&lt;p&gt;It’s important first to build a stable and reliable data platform then bring DevOps culture to the team to bundle the platform into a &lt;strong&gt;BOX&lt;/strong&gt;.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/de-c-devops.PNG&quot;&gt;&lt;img src=&quot;/articles/de-c-devops.PNG&quot; alt=&quot;image&quot; style=&quot;
    width: 90%;
    height: 90%;&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Upskill your team!&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;During &lt;strong&gt;‘data platform in a box’&lt;/strong&gt; journey it’s actually demands to have skills not only on DevOps side but the knowledge about the data tools and architectures/solutions. So the only option is to upskill the data team on DevOps skills to &lt;strong&gt;MAKE IT HAPPEN&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Actually without achieving Strategy 1; we will not be able to achieve Strategy 2.&lt;/p&gt;

&lt;p&gt;If I rewind back for demi-decade period, the love towards big data and data science is to the highest level and I still didn’t lose interest in it. Because it’s evolving and keeps me really engaged with very complex systems.&lt;/p&gt;

&lt;p&gt;But during the journey on bringing a &lt;strong&gt;‘data platform in a box’&lt;/strong&gt; just expanded my interest in DevOps and it will be part of my journey forever. Understanding devops to control big data and data science playground just drives me crazy.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/devops-love.png&quot;&gt;&lt;img src=&quot;/articles/devops-love.png&quot; alt=&quot;image&quot; style=&quot;
    width: 50%;
    height: 50%;&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Data + Data Science + DevOps 💗&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;This year the journey taught me so many things. Not only on the tech side, but also life lessons. I’m sorry, I was unable to spend time sharing some ground level work on how to bring the data platform in a box. I was really busy building the data platform. But for sure, I will spend some time next year sharing some interesting topics on &lt;strong&gt;‘data platform with devops’&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I will share sample templates and code to kick start the journey with data platform tools and devops in my github. Keep following and follow my github profile to get notified on all the new projects.&lt;/p&gt;

&lt;p&gt;Stay Tuned!&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I can’t wait to crush 2023 together… cheers! 🍺&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/data-platform-in-a-box/&quot;&gt;Data Platform-in-a-Box&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on December 14, 2022.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Life Hacks #1: My Wife Starting To Like Linux Distro]]></title>
  <link rel="alternate" type="text/html" href="/articles/life-hack-1/"/>
  <id>/articles/life-hack#1</id>
  <published>2022-05-28T19:39:55+00:00</published>
  <updated>2022-05-28T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Linux" term="Linux" /><category scheme="/tags/#Windows" term="Windows" /><category scheme="/tags/#Life" term="Life" />
  <content type="html">
    &lt;p&gt;During working hours or in an office call my family looks at me a bit differently. Especially when I always look at linux terminals, my wife keeps on bugging me as a weird guy! For the past two year I actually tried to explain in different ways but still she keeps on asking the same question.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;What is this black screen that you are staring at?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Almost two years back (2019) the coronavirus COVID-19 affected all aspects of society and all dimensions of sustainable development. It actually created a paradigm shift in our work life as well where we had to work from home. But we were not used to it and faced multiple problems. But In fact there is a saying&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Difficult Times Can Bring Out the Greatest Innovations&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Due to the sudden lockdowns my wife had to work from home and she has been working at the office all these years and she is only familiar with Windows PCs. Since her work is into personal financial data, the company requested to install &lt;a href=&quot;https://www.citrix.com/&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Citrix System&lt;/strong&gt;&lt;/a&gt;   in her personal machine to work in a cloud environment.&lt;/p&gt;

&lt;p&gt;But unfortunately we didn’t have a high-end Windows 10 laptop to support the Citrix version. We have an old machine and it works well with the Windows 7 operating system. Now, unfortunately I got all the tech related complaints. I tried so many different versions of the Citrix but it was not compatible with the Windows 7 machine and the machine was not performing well and needed a fresh install as well. But my current laptop specs do not support Windows 10/11 😢…&lt;/p&gt;

&lt;p&gt;So I took this chance to experiment and create an environment to introduce a linux based system to my wife 😀. First I installed elementary OS. Because I thought it will be good for beginners and she can learn quickly. But I faced so much trouble to find relevant drivers to support audio and wifi.&lt;/p&gt;

&lt;p&gt;After almost 4-5 hours I gave up and installed Ubuntu and with the help of the open source community I managed to install all relevant drivers to support our old laptop. But now another problem, basically citrix application does not support linux base system.&lt;/p&gt;

&lt;p&gt;😓&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;When one door closes, another door opens&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My experiment box is always a Virtual Machine (VM). So I installed a VM inside my Ubuntu machine and installed Windows 11 and it worked without any issues 🎇. Finally managed to set up Citrix successfully in Windows 11.&lt;/p&gt;

&lt;figure class=&quot;half&quot; style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/dell-laptop.jpeg&quot;&gt;&lt;img src=&quot;/articles/dell-laptop.jpeg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
	&lt;a href=&quot;/articles/dell-laptop-with-linux.jpeg&quot;&gt;&lt;img src=&quot;/articles/dell-laptop-with-linux.jpeg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Dell with Linux&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;This is a great chance that I had to show the power of open source technology and she is kind of getting familiar with Ubuntu and started to like it alot 😃&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Yes! she’s a keeper!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Key learnings:
we shouldn’t not force the OS on anyone. People should find out for themselves if it suits their needs.&lt;/p&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/life-hack-1/&quot;&gt;Life Hacks #1: My Wife Starting To Like Linux Distro&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on May 28, 2022.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[SaaS - Data Platform]]></title>
  <link rel="alternate" type="text/html" href="/articles/saas-data-platform/"/>
  <id>/articles/saas-data-platform</id>
  <published>2022-05-15T19:39:55+00:00</published>
  <updated>2022-05-15T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Data%20Science" term="Data Science" /><category scheme="/tags/#Big%20Data" term="Big Data" /><category scheme="/tags/#DevOps" term="DevOps" /><category scheme="/tags/#SRE" term="SRE" />
  <content type="html">
    &lt;p&gt;If you think to become a good Software Engineer means being a good thinker. But to become a master means willing to learn new challenges and improving the way we think.. Right? I think it’s applicable for any field of work. Because recently I tried out some unique dishes and it ended up really well! 😊. My wife and little princess 👼 liked it alot and approved it 💯&lt;/p&gt;

&lt;figure class=&quot;half&quot; style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/wedding-style-biriyani.jpg&quot;&gt;&lt;img src=&quot;/articles/wedding-style-biriyani.jpg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
	&lt;a href=&quot;/articles/wedding-style-biriyani-veg.jpg&quot;&gt;&lt;img src=&quot;/articles/wedding-style-biriyani-veg.jpg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Wedding-style Chicken Biryani&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;At a certain point, the things that get in the way help to improve problem solving skills. Basically when something is actually happening in our mind we should engage with the problem to find the root cause of it to fix it permanently.&lt;/p&gt;

&lt;p&gt;If I look at my journey my love toward solving problems is gradually expanding. I started as Desktop application developer; then moved to full stack developer and ended up settling down at big data and data science. I think probably you all might have guessed it based on my last two articles.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Yes!, SRE (DevOps) is becoming one of my interest areas for quite some time now.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Site reliability engineering (SRE) was born at Google in 2003, prior to the DevOps movement, when the first team of software engineers was tasked to make Google’s already large-scale sites more reliable, efficient, and scalable.&lt;/p&gt;

&lt;p&gt;In my current workplace - &lt;a href=&quot;https://www.circles.life/sg/&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Circles.Life&lt;/strong&gt;&lt;/a&gt;  we are creating SaaS (Software as a Service) products and for some time focusing on building &lt;a href=&quot;https://cxos.circles.life/&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;SaaS-Data Platform&lt;/strong&gt;&lt;/a&gt;. To achieve it we need to focus on two key important things.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Single Code Base - It will out-of-the-box support high configuration to deliver multiple launches&lt;/li&gt;
  &lt;li&gt;Automate End-to-End - It will include infrastructure creation, delete, update and deploy applications&lt;/li&gt;
&lt;/ol&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/devops.png&quot;&gt;&lt;img src=&quot;/articles/devops.png&quot; alt=&quot;image&quot; style=&quot;
    width: 50%;
    height: 50%;&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;Folks on big data &amp;amp; data science teams current reaction will be:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Dude, how can we do it? Because, There are so many tools and applications to build a futuristic data platform”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yes! I know I was in the state for almost 3 months. But after getting to know about DevOps and the tools took me to a different type of world. Starting to love the journey to automate end-to-end Data Platform. If you have a good understanding about big data and data science tools with DevOps magic wands we will be able to automate it with one button click!&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/devops-magic-wand.png&quot;&gt;&lt;img src=&quot;/articles/devops-magic-wand.png&quot; alt=&quot;image&quot; style=&quot;
    width: 50%;
    height: 50%;&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;I have already started posting a few items on DevOps work with Data Platform and stay tuned for more interesting posts.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;PS:&lt;/strong&gt; As new parents just puts your whole world into perspective. Everything else just disappears 😫. But I will try to find some time to share the knowledge.&lt;/em&gt;&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/mom-dad-baby.PNG&quot;&gt;&lt;img src=&quot;/articles/mom-dad-baby.PNG&quot; alt=&quot;image&quot; style=&quot;
    width: 50%;
    height: 50%;&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/saas-data-platform/&quot;&gt;SaaS - Data Platform&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on May 15, 2022.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Trino - Popular Distributed Interactive Query Engine]]></title>
  <link rel="alternate" type="text/html" href="/articles/Trino/"/>
  <id>/articles/Trino</id>
  <published>2021-06-05T19:39:55+00:00</published>
  <updated>2021-06-05T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Data%20Science" term="Data Science" /><category scheme="/tags/#Big%20Data" term="Big Data" /><category scheme="/tags/#Data" term="Data" /><category scheme="/tags/#Trino" term="Trino" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));-webkit-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);-moz-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;-webkit-transform: rotate(45deg);-moz-transform: rotate(45deg);-ms-transform: rotate(45deg);-o-transform: rotate(45deg);transform: rotate(45deg);&quot;&gt;&lt;a href=&quot;https://github.com/Renien/trino-poc&quot; style=&quot;font: 700 13px &amp;quot;Helvetica Neue&amp;quot;, Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-shadow: 0 -1px rgba(0, 0, 0, 0.5);text-align: center;width: 200px;line-height: 20px;display: inline-block;padding: 2px 0;border-width: 1px 0;border-style: dotted;border-color: rgba(255, 255, 255, 0.7);&quot;&gt;Fork me on GitHub&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Trino(formerly PrestoSQL&lt;/strong&gt;)&lt;/a&gt; is a popular distributed interactive query engine in data platform. Trino can be used as a fast query engine.&lt;/p&gt;

&lt;p&gt;As a data platform component, Trino is becoming one of my favorite components. Here I am going to show you the way to deploy/explore trino in your local machine.&lt;/p&gt;

&lt;p&gt;You can find a simplified diagram of how trino is integrated with the data platform.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;https://raw.githubusercontent.com/Renien/trino-poc/master/doc/trino-architecture.png&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/Renien/trino-poc/master/doc/trino-architecture.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;This tool can enric &lt;a href=&quot;https://www.gartner.com/smarterwithgartner/data-fabric-architecture-is-key-to-modernizing-data-management-and-integration/&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Data fabric architecture&lt;/strong&gt;&lt;/a&gt; and &lt;a href=&quot;https://www.cuelogic.com/blog/data-mesh&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Data mesh architecture&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The project is built and deployed using docker. It has simply 3 module trino-base where you will find the base image and two important components for trino.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;trino-coordinator&lt;/li&gt;
  &lt;li&gt;trino-worker&lt;/li&gt;
&lt;/ol&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/folder-structure-comments.png&quot;&gt;&lt;img src=&quot;/articles/folder-structure-comments.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;To get more detail deployment steps please refere to this &lt;a href=&quot;https://github.com/Renien/trino-poc#readme&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;README&lt;/strong&gt;&lt;/a&gt;  file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thanks to Trino where we can have one query engine and write federated query 🔥 to communicate with different data sources 🤯&lt;/strong&gt;&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;https://raw.githubusercontent.com/Renien/trino-poc/master/doc/federation-query-comments.png&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/Renien/trino-poc/master/doc/federation-query-comments.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;Trino POC: &lt;a href=&quot;https://github.com/Renien/trino-poc&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Trino&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;/articles/Trino/&quot;&gt;Trino - Popular Distributed Interactive Query Engine&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on June 05, 2021.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Installing Spark & Livy with Ansible]]></title>
  <link rel="alternate" type="text/html" href="/articles/installing-spark-livy-ansible/"/>
  <id>/articles/installing-spark-livy-ansible</id>
  <published>2021-01-22T19:39:55+00:00</published>
  <updated>2021-01-22T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Data%20Science" term="Data Science" /><category scheme="/tags/#Big%20Data" term="Big Data" /><category scheme="/tags/#Spark" term="Spark" /><category scheme="/tags/#Livy" term="Livy" /><category scheme="/tags/#Automate" term="Automate" /><category scheme="/tags/#Machine%20Learning" term="Machine Learning" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));-webkit-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);-moz-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;-webkit-transform: rotate(45deg);-moz-transform: rotate(45deg);-ms-transform: rotate(45deg);-o-transform: rotate(45deg);transform: rotate(45deg);&quot;&gt;&lt;a href=&quot;https://github.com/Renien/docker-spark-livy&quot; style=&quot;font: 700 13px &amp;quot;Helvetica Neue&amp;quot;, Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-shadow: 0 -1px rgba(0, 0, 0, 0.5);text-align: center;width: 200px;line-height: 20px;display: inline-block;padding: 2px 0;border-width: 1px 0;border-style: dotted;border-color: rgba(255, 255, 255, 0.7);&quot;&gt;Fork me on GitHub&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;Deploying your own Spark Standalone cluster with Livy is a time consuming task due to the complexity of the deployments. Even for the guys who want to play around with spark and livy in their local machine or even in virtual machines this project will be useful.&lt;/p&gt;

&lt;p&gt;In my recent project we are building a &lt;strong&gt;SAAS platform&lt;/strong&gt; where we need to deploy dev and stage environments in different cloud providers. Therefore, I used docker and ansible for deployments. Spark can be run on top of the Hadoop ecosystem and as stand alone mode. Therefore, I created the &lt;strong&gt;latest version (2.4.7) of Spark Standalone docker image and the latest version of Livy (0.7.0 Incubating) image&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://hub.docker.com/repository/docker/renien/spark-stand-alone&quot; target=&quot;_blank&quot;&gt;Spark Standalone Docker Image&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://hub.docker.com/r/renien/spark-stand-alone-livy&quot; target=&quot;_blank&quot;&gt;Spark with Livy Docker Image&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The following &lt;a href=&quot;https://github.com/Renien/ansible-spark-livy&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;starter kit&lt;/strong&gt;&lt;/a&gt; repository contains a simple ansible playbook to install a Spark Standalone cluster and Livy in docker. Current playbook contains only local ip to install docker and deploy spark cluster and livy. Based on your needs you will be able add more environments and automate your big data dev, stage and prod environments.&lt;/p&gt;

&lt;p&gt;The repository &lt;a href=&quot;https://github.com/Renien/ansible-spark-livy/blob/master/README.md&quot; target=&quot;_blank&quot;&gt;README file&lt;/a&gt; contains more details about setting up the environments.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Thanks to Ansible we can scale the platform with one button click.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/ansible.jpeg&quot;&gt;&lt;img src=&quot;/articles/ansible.jpeg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;Starter Kit: &lt;a href=&quot;https://github.com/Renien/ansible-spark-livy&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Ansible Spark Livy&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;/articles/installing-spark-livy-ansible/&quot;&gt;Installing Spark & Livy with Ansible&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on January 22, 2021.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Building Apache Spark with Play Framework]]></title>
  <link rel="alternate" type="text/html" href="/articles/play-with-spark/"/>
  <id>/articles/play-with-spark</id>
  <published>2020-03-10T19:39:55+00:00</published>
  <updated>2020-03-10T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Data%20Science" term="Data Science" /><category scheme="/tags/#Big%20Data" term="Big Data" /><category scheme="/tags/#Spark" term="Spark" /><category scheme="/tags/#PlayFramework" term="PlayFramework" /><category scheme="/tags/#Scala" term="Scala" /><category scheme="/tags/#Machine%20Learning" term="Machine Learning" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));-webkit-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);-moz-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;-webkit-transform: rotate(45deg);-moz-transform: rotate(45deg);-ms-transform: rotate(45deg);-o-transform: rotate(45deg);transform: rotate(45deg);&quot;&gt;&lt;a href=&quot;https://github.com/Renien/play-with-spark-starter-kit&quot; style=&quot;font: 700 13px &amp;quot;Helvetica Neue&amp;quot;, Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-shadow: 0 -1px rgba(0, 0, 0, 0.5);text-align: center;width: 200px;line-height: 20px;display: inline-block;padding: 2px 0;border-width: 1px 0;border-style: dotted;border-color: rgba(255, 255, 255, 0.7);&quot;&gt;Fork me on GitHub&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;Nowadays businesses are hiring data scientists in droves to make rigorous, scientific, unbiased, data-driven decisions.  The data is huge, therefore the biggest challenge is to build faster and scalable systems for prediction and decision making engines.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/decision.jpeg&quot;&gt;&lt;img src=&quot;/articles/decision.jpeg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;Without faster and scalable decision-making engine business will not be able to create data driven business models&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There are situations where business needs real time predictions for daily trained models. One of the most popular high velocity framework is Play Framework. It allows to build web applications which follow the model-view-controller (MVC) architectural pattern. Apache Spark is a lightning-fast cluster computing engine that supports well matured many distributed machine learning algorithms.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;What if we can run Apache Spark on a local machine along with Play Framework controllers?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If we can manage to run Apache Spark application along with Play application we can leverage Spark ML algorithms for real time predictions.&lt;/p&gt;

&lt;p&gt;In my recent work I had to setup play with spark and I faced multiple dependency issues in Akka and guava versions. Therefore I have created a starter kit for anyone who is interested in working with play-spark.&lt;/p&gt;

&lt;p&gt;In this starter kit you will find an API where it will count the words on top of spark context and the result will be returned as response (http://localhost:9000/prediction).&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/decision.jpeg&quot;&gt;&lt;img src=&quot;/articles/model-prediction.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;In this example it includes, Spark Model loading guice module where it creates a Spark Context/Spark Session as singleton object during the Application startup time.&lt;/p&gt;

&lt;p&gt;By following this example we can integrate any complex Spark ML models to provide real time prediction.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt;
├── app
│   ├── controllers
│   │   └── HomeController.java
│   ├── guice
│   │   └── module
│   │       └── MLLibModule.java              &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; Machine Learning guice module
│   ├── ml
│   │   └── ModelPrediction.scala             &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; Model prediction class &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;word count example&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
│   ├── response
│   ├── startup
│   │   └── AppLoader.java                    &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; App loader module
&lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt;
.&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Starter Kit: &lt;a href=&quot;https://github.com/Renien/play-with-spark-starter-kit&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;Play-With-Spark&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;/articles/play-with-spark/&quot;&gt;Building Apache Spark with Play Framework&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on March 10, 2020.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Life is a Journey]]></title>
  <link rel="alternate" type="text/html" href="/articles/life-is-a-journey/"/>
  <id>/articles/life-is-a-journey</id>
  <published>2017-12-31T19:39:55+00:00</published>
  <updated>2017-12-31T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Life" term="Life" /><category scheme="/tags/#Lessons" term="Lessons" />
  <content type="html">
    &lt;p&gt;Most of them not know me, but I’ve written this article specifically for you. Yes, you. I know you’re feeling pretty down right now and that current situation makes the world seem like full of misery.&lt;/p&gt;

&lt;p&gt;Friends who knows me would sound like,&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Common man this supposed to be a TECH blog, what are all these scrap?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Response,&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Hold on guys! Its Year end, I can take one more step to make another mistake and learn from it :P”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hey there, don’t worry this blog always will be about tech information and I just want to &lt;strong&gt;share&lt;/strong&gt; this years’ experience which I learned from 2017.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/sharing-is-caring.jpg&quot;&gt;&lt;img src=&quot;/articles/sharing-is-caring.jpg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Sharing is Caring
    &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Most of us will always have great plans in the beginning of the year but after couple of months including me they will forget the plan and act based on reflection on your past days. It will become kind of feedback loop at the year-end and you will realise all the mistakes and freak out.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/dont-kw-wht-life.png&quot;&gt;&lt;img src=&quot;/articles/dont-kw-wht-life.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt; 
    &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;I’m not here to tell you next year you can plan again and achieve it and that everything is going to be okay – you already know that. But despite knowing it, you still feel doubtful about it. I was always in an impression; I was able handle all the pressure but just realised within last 3 weeks what I faced was terrible.  I was so patient and handling work life and family but some actions can end with biggest disaster.&lt;/p&gt;

&lt;p&gt;My dear friends, but always don’t lose your hope. Experience matters because always keep in mind,&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Life is a Journey, Not a Destination”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;happy-new-year-2018-&quot;&gt;Happy New Year 2018 ..&lt;/h2&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/life-is-a-journey/&quot;&gt;Life is a Journey&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on December 31, 2017.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Cloud-DataSync]]></title>
  <link rel="alternate" type="text/html" href="/articles/cloud-bigdata-sync/"/>
  <id>/articles/cloud-bigdata-sync</id>
  <published>2017-05-09T19:39:55+00:00</published>
  <updated>2017-05-09T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Big%20Data" term="Big Data" /><category scheme="/tags/#Cloud" term="Cloud" /><category scheme="/tags/#Shell" term="Shell" /><category scheme="/tags/#Hadoop" term="Hadoop" />
  <content type="html">
    &lt;p&gt;There are plenty of good reasons to move towards cloud, but mainly it make good business sense. Service like AWS Elastic MapReduce (EMR Cluster) and GCP DataProc makes life easy for the companies to maintain and extract intent from large data.&lt;/p&gt;

&lt;p&gt;In big data platform Lamda Architecture (LA) is a well-known concept for scalable and fault-tolerant data processing platform. Nathan Marz address the importans of &lt;a href=&quot;https://www.amazon.com/Big-Data-Principles-practices-scalable/dp/1617290343&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;‘vertically partitioned data’ – Big Data&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you determine the requirement of big data storage the main thing is reliable and It is recommended to store the data as different partitioned (eg: daily, hourly partitioned). There are many advantages by partitioning vertically the data. One of the important scenario is while processing the data, if the extracted click stream data is corrupted after a deployment change due a bug it will be really easy to skip those corrupted partitioned than spending long hours to debug the large data.&lt;/p&gt;

&lt;p&gt;Therefore, during the migration process the biggest challenge is to move all the partitioned/non-partitioned HDFS data to Cloud buckets. Considering the migration process and to sync the data,&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/cloud-data-sync.png&quot;&gt;&lt;img src=&quot;/articles/cloud-data-sync.png&quot; alt=&quot;image&quot; style=&quot;width: 25%; height: 25%;&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;Introducing: &lt;a href=&quot;https://github.com/Renien/cloud-datasync&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;cloud-datasync&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The cloud-datasync tool currently supports to incrementally copy the partitioned and bulk copy the non-partitioned data to GCP/AWS cloud data storage from any local linux/mac machines. These are new features I have planned for the tool,&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Using ‘distcp’ command effectively copy the data from local HDFS cluster to any cloud service&lt;/li&gt;
  &lt;li&gt;Automatically generate Azkaban job flow and schedule the data copy jobs in Azkaban servers. We can use &lt;a href=&quot;https://pypi.python.org/pypi/azkaban/0.6.43&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;Azkaban Python Package&lt;/em&gt;&lt;/a&gt; to implement this feature.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let me know what you think about ‘cloud-datasync’ below in the comments and share your thoughts. If you want to share any new features/issues, feel free to open an issue in the GitHub repository.&lt;/p&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/cloud-bigdata-sync/&quot;&gt;Cloud-DataSync&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on May 09, 2017.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Doc-Diff (Python Library)]]></title>
  <link rel="alternate" type="text/html" href="/articles/doc-diff/"/>
  <id>/articles/doc-diff</id>
  <published>2017-04-02T19:39:55+00:00</published>
  <updated>2017-04-02T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Data%20Science" term="Data Science" /><category scheme="/tags/#Big%20Data" term="Big Data" /><category scheme="/tags/#Python" term="Python" /><category scheme="/tags/#Data%20Mining" term="Data Mining" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));-webkit-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);-moz-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;-webkit-transform: rotate(45deg);-moz-transform: rotate(45deg);-ms-transform: rotate(45deg);-o-transform: rotate(45deg);transform: rotate(45deg);&quot;&gt;&lt;a href=&quot;https://github.com/Renien/doc-diff&quot; style=&quot;font: 700 13px &amp;quot;Helvetica Neue&amp;quot;, Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-shadow: 0 -1px rgba(0, 0, 0, 0.5);text-align: center;width: 200px;line-height: 20px;display: inline-block;padding: 2px 0;border-width: 1px 0;border-style: dotted;border-color: rgba(255, 255, 255, 0.7);&quot;&gt;Fork me on GitHub&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;Python is often the choice for developers who need to apply data analysis in their work or mainly data scientists/data engineers whose tasks are more related deriving insight from the data.&lt;/p&gt;

&lt;p&gt;One of Python’s greatest assets is its extensive set of libraries. Recently, I was working on very popular Data Mining algorithms (i.e: FP-Growth and Custom A-Priori). There was a situation I wanted to get comprehensive analysis report on results generated by these algorithms.&lt;/p&gt;

&lt;p&gt;As a support lib for Data Science work introducing &lt;a href=&quot;https://github.com/Renien/doc-diff&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;“doc-dff — Generate the diff data between two files”&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/doc-diff-logo.png&quot;&gt;&lt;img src=&quot;/articles/doc-diff-logo.png&quot; alt=&quot;image&quot; style=&quot;width: 25%; height: 25%;&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;doc-diff supports the following features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Generate the following comparison reports
    &lt;ul&gt;
      &lt;li&gt;common_in_doc1-and-doc2-%Y-%m-%d.csv&lt;/li&gt;
      &lt;li&gt;common_key_with_diff_values-%Y-%m-%d.csv&lt;/li&gt;
      &lt;li&gt;exclusive_in_doc1-%Y-%m-%d.csv&lt;/li&gt;
      &lt;li&gt;exclusive_in_doc2-%Y-%m-%d.csv&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Compare two files and return following ‘dicts(prodCode, recommendation)’
    &lt;ul&gt;
      &lt;li&gt;common_in_doc1_and_doc2_list = dicts()&lt;/li&gt;
      &lt;li&gt;common_key_with_diff_values_list = dicts()&lt;/li&gt;
      &lt;li&gt;exclusive_in_doc1_list = dicts()&lt;/li&gt;
      &lt;li&gt;exclusive_in_doc2_list = dicts()&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;install&quot;&gt;Install&lt;/h2&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;pip &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;doc-diff&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;implementation&quot;&gt;Implementation&lt;/h2&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;doc_diff&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Diff&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;doc_diff&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gen_comp_report&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;__name__&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;__main__&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# Data file location
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;a_priori_csv_location&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;./data/a-priori.csv&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;pfp_csv_location&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;./data/pfp.csv&quot;&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Process a-priori.csv data file
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;a_priori_diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a_priori_csv_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;a_priori_diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;process_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Process pfp.csv data file
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;pfp_diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pfp_csv_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;pfp_diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;process_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;gen_comp_report&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a_priori_diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pfp_diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I’m looking forward to open source all my supportive lib for Data Science/Data Engineering work. Let me know what you think about ‘doc-diff’ below in the comments and share your thoughts. If you want to share any new features/issues, &lt;a href=&quot;https://github.com/Renien/doc-diff/issues&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;feel free to open an issue in the GitHub repository&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/doc-diff/&quot;&gt;Doc-Diff (Python Library)&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on April 02, 2017.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[ETL-Starter-Kit (Extract Transform Load)]]></title>
  <link rel="alternate" type="text/html" href="/articles/etl-starter-kit/"/>
  <id>/articles/etl-starter-kit</id>
  <published>2017-03-20T19:39:55+00:00</published>
  <updated>2017-03-20T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Data%20Science" term="Data Science" /><category scheme="/tags/#Big%20Data" term="Big Data" /><category scheme="/tags/#ETL" term="ETL" /><category scheme="/tags/#Hadoop" term="Hadoop" /><category scheme="/tags/#Scalding" term="Scalding" /><category scheme="/tags/#Azkaban" term="Azkaban" /><category scheme="/tags/#Gradle" term="Gradle" />
  <content type="html">
    &lt;div class=&quot;github-fork-ribbon&quot; style=&quot;position: fixed;padding: 2px 0;background-color: #000;background-image: linear-gradient(to bottom, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.15));-webkit-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);-moz-box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);box-shadow: 0 2px 3px 0 rgba(0, 0, 0, 0.5);z-index: 9999;pointer-events: auto;top: 42px;right: -43px;-webkit-transform: rotate(45deg);-moz-transform: rotate(45deg);-ms-transform: rotate(45deg);-o-transform: rotate(45deg);transform: rotate(45deg);&quot;&gt;&lt;a href=&quot;https://github.com/Renien/ETL-Starter-Kit&quot; style=&quot;font: 700 13px &amp;quot;Helvetica Neue&amp;quot;, Helvetica, Arial, sans-serif;color: #fff;text-decoration: none;text-shadow: 0 -1px rgba(0, 0, 0, 0.5);text-align: center;width: 200px;line-height: 20px;display: inline-block;padding: 2px 0;border-width: 1px 0;border-style: dotted;border-color: rgba(255, 255, 255, 0.7);&quot;&gt;Fork me on GitHub&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;Lambda Architecture(LA) is a well-known data processing architecture designed to handle massive amount of data and it’s commonly known as Big Data. It contains both batch and stream processing methods.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/etl-lamda.jpg&quot;&gt;&lt;img src=&quot;/articles/etl-lamda.jpg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;When I started my journey in Data Science to process massive data came across an excellent book,&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/big-data-book.jpeg&quot;&gt;&lt;img src=&quot;/articles/big-data-book.jpeg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Big Data Principles and best practices of scalable real-time data systems by &lt;a href=&quot;https://twitter.com/nathanmarz&quot; target=&quot;_blank&quot;&gt;Nathan Marz&lt;/a&gt; and James Warren.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I recommend to read this book for all Big Data/Data Science developers.&lt;/p&gt;

&lt;p&gt;In Big Data/Data Science project LA aims to satisfy the need to scale up and shrink indecently and also the system that is fault tolerant. Therefore, it is essential to get a correct structure for your project before starting over the implementation.&lt;/p&gt;

&lt;p&gt;I spent almost couple of years building Lambda Architecture design for client specific projects. I realized in Big Data domain starter-kit projects are lacking when comparing to JavaScript world. So, I have published minimal skeleton project to process Big Data,&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/ELT.png&quot;&gt;&lt;img src=&quot;/articles/ELT.png&quot; alt=&quot;image&quot; width=&quot;20%&quot; height=&quot;20%&quot; /&gt;&lt;/a&gt;
&lt;/figure&gt;

&lt;p&gt;Introducing: &lt;a href=&quot;https://github.com/Renien/ETL-Starter-Kit&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;ETL-Starter-Kit&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since the repository is to keep only the structure; different type of many sample jobs are not implemented. Based on your requirement be free to modify and implement different type of batch/streaming jobs (Spark, Hive, Pig etc)&lt;/p&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/etl-starter-kit/&quot;&gt;ETL-Starter-Kit (Extract Transform Load)&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on March 20, 2017.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Gradle build and Jar Hell]]></title>
  <link rel="alternate" type="text/html" href="/blog/gradle-buidl-and-jar-hell/"/>
  <id>/blog/gradle-buidl-and-jar-hell</id>
  <published>2017-03-09T00:00:00+00:00</published>
  <updated>2017-03-09T00:00:00+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Gradle" term="Gradle" /><category scheme="/tags/#Groovy" term="Groovy" /><category scheme="/tags/#Hadoop" term="Hadoop" /><category scheme="/tags/#BigData" term="BigData" />
  <content type="html">
    &lt;h1 id=&quot;overview&quot;&gt;Overview&lt;/h1&gt;

&lt;p&gt;One of the challenging task is to setup the ground level structure for a project and build scripts focusing a large growing project. I was spending almost a week incrementally building the product features and updating the build scripts for a data science project.&lt;/p&gt;

&lt;p&gt;To make sure all my dependencies are available for my Hadoop MapReduce jobs I’m creating FAT JAR. Due to multiple dependencies in the project and Hadoop cluster I faced a Jar Hell problem. JAR Hell is an endearing term referring to the problems that arise from the characteristics of Java’s class loading mechanism. Simply, it is a dependency version conflict issue.&lt;/p&gt;

&lt;p&gt;So I decided to move few libraries to use only at compilation time and during the Hadoop job execution it will get resolved from the HDP libraries. Since the provided scope was not supported in some Gradle version I implemented a workaround using the groovy script.&lt;/p&gt;

&lt;p&gt;Create your own configuration:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-gradle&quot; data-lang=&quot;gradle&quot;&gt;&lt;span class=&quot;n&quot;&gt;configurations&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;providedCompile&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;and set it to be used with the compilation classpath:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-gradle&quot; data-lang=&quot;gradle&quot;&gt;&lt;span class=&quot;k&quot;&gt;sourceSets&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;compileClasspath&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;configurations&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;providedCompile&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;After ‘providedCompile’ task is added modified my build script to use few libs during only compilation. Eg:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-gradle&quot; data-lang=&quot;gradle&quot;&gt;&lt;span class=&quot;n&quot;&gt;providedCompile&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;([&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;libs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pig&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;But unfortunately Gradle build failed for some reason with the following error message:&lt;/p&gt;

&lt;p&gt;&lt;span style=&quot;color:red&quot;&gt;You can’t change configuration ‘providedCompile’ because it is already resolved!&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The issue was in groovy : &lt;a href=&quot;https://discuss.gradle.org/t/custom-provided-configuration-not-working-with-gradle-2-0-rc2-in-multi-project-mode/2459&quot; target=&quot;_blank&quot;&gt;Gradle Community Forums&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since I’m using Gradle 2 version I faced this issue. Because they have updated Gradle 2 with Groovy 2.3, which is no longer supported to add single element to a collection. Instead of adding a single element we need to add it as a collection passing the single element.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-gradle&quot; data-lang=&quot;gradle&quot;&gt;&lt;span class=&quot;k&quot;&gt;sourceSets&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;compileClasspath&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;configurations&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;providedCompile&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;After fixing the issues in build script, I was able to build and execute the job successfully.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;/blog/gradle-buidl-and-jar-hell/&quot;&gt;Gradle build and Jar Hell&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on March 09, 2017.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Gradle: Invalid method Code length 66047 in class file jobs_51dck7bh02aiqhihu32e2hl475]]></title>
  <link rel="alternate" type="text/html" href="/blog/gradle-invalid-code-length/"/>
  <id>/blog/gradle-invalid-code-length</id>
  <published>2017-02-27T00:00:00+00:00</published>
  <updated>2017-02-27T00:00:00+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Gradle" term="Gradle" /><category scheme="/tags/#Groovy" term="Groovy" /><category scheme="/tags/#Java" term="Java" />
  <content type="html">
    &lt;h1 id=&quot;overview&quot;&gt;Overview&lt;/h1&gt;

&lt;p&gt;Recently, I came across the following issue (&lt;em&gt;Figure 1&lt;/em&gt;) in my build script when I started to add more custom tasks into my build script. The current project code base is huge and lot of custom build process needs to be implemented to automate the project build.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
  &lt;a href=&quot;/blog/gradle-error-invalid-method-code-length.jpg&quot;&gt;&lt;img src=&quot;/blog/gradle-error-invalid-method-code-length.jpg&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 1 - Invalid Method Length&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;After reading so many articles got to know that these issues can happen due to the method length in our java code. Basically, JVM has a limit on method size in bytes, but during byte code transformation which is required to make possible mocking static, final size of the method could exceed this limit. But even though when I shorten my groovy custom tasks it was throwing the same error.&lt;/p&gt;

&lt;p&gt;I don’t know for some reason, I thought to modularize the build script and import the different sub.gradle files to the main gradle file so that number of lines of code will be reduced. At last I was able to build successfully.&lt;/p&gt;

&lt;p&gt;Basically, while transforming the code to Java byte code due to the big lengthy build script it exceeded the JVM method size limit.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;/blog/gradle-invalid-code-length/&quot;&gt;Gradle: Invalid method Code length 66047 in class file jobs_51dck7bh02aiqhihu32e2hl475&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on February 27, 2017.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Interest Towards Data Science and Lessons Learnt]]></title>
  <link rel="alternate" type="text/html" href="/articles/interest-towards-ds-lesson-learnt/"/>
  <id>/articles/interest-towards-ds-lesson-learnt</id>
  <published>2016-12-30T19:39:55+00:00</published>
  <updated>2016-12-30T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#Data%20Science" term="Data Science" /><category scheme="/tags/#Life" term="Life" /><category scheme="/tags/#Lessons" term="Lessons" />
  <content type="html">
    &lt;p&gt;It’s been a long time I shared some valuable information. Because half-year in 2016, I was spending long time assertively building myself to fill my gaps to work on some very interesting projects.&lt;/p&gt;

&lt;p&gt;Being in IT field there are numerous areas and opportunities to work. Having almost one-time experience in most of the areas nothing interested me like working with data. Actually, the interest was started in mid 2015 and I wrote a first article on &lt;a href=&quot;http://renien.com/articles/learn-about-big-data/&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;“Learn about Big Data”&lt;/strong&gt;&lt;/a&gt;. After reading so many blogs and articles I ended in one image. The &lt;em&gt;Figure 1&lt;/em&gt; shows the skill sets that you need to work on Data Science field. I fear a lot looking at Big Data and Data Science.&lt;/p&gt;

&lt;figure style=&quot;text-align: center;&quot;&gt;
	&lt;a href=&quot;/articles/data-science-vd.png&quot;&gt;&lt;img src=&quot;/articles/data-science-vd.png&quot; alt=&quot;image&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 1: Skills for a Data Science (Image from a another source)
    &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Of Course who will not get scared when they look at this Venn Diagram :P”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I always believe technology, should give us superpowers and the data is the soul of everything, so thought of let’s give it a try.&lt;/p&gt;

&lt;p&gt;It is still not an easy journey, every day we learn new things from everywhere. But when I look back from mid 2016 up to now it surprises me that I didn’t know much stuff back in that time. It’s all matter of identifying your weaknesses and try hard to overcome it. The moment you can identify your weaknesses believe me you’re going in the correct path. Many people will have their own different goals. But I like to put it in a different way,&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Fix your goals as what you really like to do and “makes you happy”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In year 2016, spoke to lot of professionals and students about goals and being happy in what they are doing.  Most of them are still confused about it. I believe, it is a good sign because I always have it in my self too.&lt;/p&gt;

&lt;p&gt;The confusion helps to open up the untouched energy in your brain and before taking a decision you would consider lots of factors. But then decisions can go wrong, it is up to you to overcome and work towards the next steps.&lt;/p&gt;

&lt;p&gt;The current situation is different. The growth of technology has brought everything to our door steps. If you think to learn new stuffs in your field, there are plenty and more resources available out there in internet and it is in our hand to find all our diamonds.&lt;/p&gt;

&lt;p&gt;Later if you realise what you really actually want to do in your life, My dear friends just don’t think you’re too late. Take that day as your start date and work hard on them and results will follow you.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Life is about growing and improving and getting better”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;happy-new-year-2017-&quot;&gt;Happy New Year 2017 ..&lt;/h2&gt;


    &lt;p&gt;&lt;a href=&quot;/articles/interest-towards-ds-lesson-learnt/&quot;&gt;Interest Towards Data Science and Lessons Learnt&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on December 30, 2016.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Exclude particular fields from SELECT queries in Hive]]></title>
  <link rel="alternate" type="text/html" href="/blog/exclude-field-hive-query/"/>
  <id>/blog/exclude-field-hive-query</id>
  <published>2016-06-28T00:00:00+00:00</published>
  <updated>2016-06-28T00:00:00+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#BigData" term="BigData" /><category scheme="/tags/#Hive" term="Hive" /><category scheme="/tags/#MapReduce" term="MapReduce" />
  <content type="html">
    &lt;p&gt;Hive is a high level language to analyse large volumes of data. The easiest way to select specific columns in Hive query is by specifying the column name in the select statement.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col4&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;....&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Table1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;But imagine your table contains many columns (i.e : more than 100 columns) and you need to only exclude a few columns in the select statement. Therefore, Hive query should be able to select all the columns excluding the defined columns in the query. To achieve it you need to follow these steps.&lt;/p&gt;

&lt;p&gt;In &lt;em&gt;‘hive-site.xml’&lt;/em&gt; add the following configuration,&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;hive.support.quoted.identifiers&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;none&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;and execute the query as&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`(extract_date)?+.+`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;TABLE_NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;with appropriate column and table name.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;i.e&lt;/em&gt; Think you need to exclude only &lt;em&gt;‘transaction_date’&lt;/em&gt; column in a select statement from a &lt;em&gt;‘cart’&lt;/em&gt; table. Then the query will be,&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`( transaction_date)?+.+`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cart&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;


    &lt;p&gt;&lt;a href=&quot;/blog/exclude-field-hive-query/&quot;&gt;Exclude particular fields from SELECT queries in Hive&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on June 28, 2016.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Scoping Rules of R]]></title>
  <link rel="alternate" type="text/html" href="/blog/scoping-rules/"/>
  <id>/blog/scoping-rules</id>
  <published>2016-04-06T19:39:55+00:00</published>
  <updated>2016-04-06T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#R%20Language" term="R Language" /><category scheme="/tags/#Data" term="Data" />
  <content type="html">
    &lt;p&gt;When we use some symbol to implement our logics, how does R know which value to assign?&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;R functions are treated much like any other R objects.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Assume we are running the following code,&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; c &amp;lt;- 100 &lt;span class=&quot;c&quot;&gt;## Assign a value to &apos;c&apos;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;c+1&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Increment &lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;1] 101
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; vec &amp;lt;- c&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;1:10&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Create a vector&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; vec
 &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;1]  1  2  3  4  5  6  7  8  9 10&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If we observe properly, we can still use &lt;strong&gt;&lt;em&gt;c()&lt;/em&gt;&lt;/strong&gt; to create vectors. Now the question is how does R know to use which &lt;strong&gt;&lt;em&gt;c&lt;/em&gt;&lt;/strong&gt; symbol to use and when? It’s because,&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;R has separate namespace for functions and non-functions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lets try to understand it clearly. When R tried to &lt;strong&gt;&lt;em&gt;‘bind/connect’&lt;/em&gt;&lt;/strong&gt; a value to a symbol (in our case &lt;strong&gt;&lt;em&gt;c()&lt;/em&gt;&lt;/strong&gt;, it search for the corresponding symbol in an order.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Search the global environment (workspace) for a symbol name matching the request.&lt;/li&gt;
  &lt;li&gt;Search the namespaces of each of the packages on the search list.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the search list will have the follow packages (packages which we have loaded) including &lt;strong&gt;&lt;em&gt;“.GlobalEnv”&lt;/em&gt;&lt;/strong&gt;  as a first item in the search list and the &lt;strong&gt;&lt;em&gt;“base”&lt;/em&gt;&lt;/strong&gt; is always at the very end.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; search&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;1] &lt;span class=&quot;s2&quot;&gt;&quot;.GlobalEnv&quot;&lt;/span&gt;        &lt;span class=&quot;s2&quot;&gt;&quot;tools:rstudio&quot;&lt;/span&gt;     &lt;span class=&quot;s2&quot;&gt;&quot;package:stats&quot;&lt;/span&gt;    
 &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;4] &lt;span class=&quot;s2&quot;&gt;&quot;package:graphics&quot;&lt;/span&gt;  &lt;span class=&quot;s2&quot;&gt;&quot;package:grDevices&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;package:utils&quot;&lt;/span&gt;    
 &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;7] &lt;span class=&quot;s2&quot;&gt;&quot;package:datasets&quot;&lt;/span&gt;  &lt;span class=&quot;s2&quot;&gt;&quot;package:methods&quot;&lt;/span&gt;   &lt;span class=&quot;s2&quot;&gt;&quot;Autoloads&quot;&lt;/span&gt;        
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;10] &lt;span class=&quot;s2&quot;&gt;&quot;package:base&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“.GlobalEnv”&lt;/em&gt;&lt;/strong&gt; is just our workspace. If there is a symbol matching it will get it from workspace. If nothing found it will search from the rest of the namespace in each of the packages.&lt;/p&gt;

&lt;h2 id=&quot;rules-of-scoping&quot;&gt;Rules of scoping&lt;/h2&gt;

&lt;p&gt;R uses scoping rules called &lt;strong&gt;&lt;em&gt;Lexical scoping&lt;/em&gt;&lt;/strong&gt; (static scoping).&lt;/p&gt;

&lt;p&gt;It will determine the value associated with free variable function.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; fun &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x,y&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
+   x^2 + y / z
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; fun&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;2,3&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In the above function we have two arguments and they are &lt;strong&gt;&lt;em&gt;x&lt;/em&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;em&gt;y&lt;/em&gt;&lt;/strong&gt;. But inside the function body we can find another symbol &lt;strong&gt;&lt;em&gt;‘z’&lt;/em&gt;&lt;/strong&gt;. In this case &lt;strong&gt;&lt;em&gt;z&lt;/em&gt;&lt;/strong&gt; is called &lt;strong&gt;&lt;em&gt;free variable&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;According to scoping rules in R it first searches in the environment where the function was defined. An environment is collection of symbols and values. Environments have patents.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; parent.env&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;globalenv&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&amp;lt;environment: 0x10390d6e0&amp;gt;
attr&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;,&lt;span class=&quot;s2&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;1] &lt;span class=&quot;s2&quot;&gt;&quot;tools:rstudio&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Since we defined &lt;strong&gt;&lt;em&gt;fun&lt;/em&gt;&lt;/strong&gt; function in global environment, R will look for &lt;strong&gt;&lt;em&gt;z&lt;/em&gt;&lt;/strong&gt; in that scope (environment).&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; environment&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;fun&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## to check the scope&lt;/span&gt;
&amp;lt;environment: R_GlobalEnv&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;These rule are matters because we can define some complex logics and function.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; y &amp;lt;- 10 &lt;span class=&quot;c&quot;&gt;## y Symbol&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; f1 &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+   y &amp;lt;- 2 &lt;span class=&quot;c&quot;&gt;## Binding value to &apos;y&apos; symbol&lt;/span&gt;
+   y^2 + f2&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## &apos;f2&apos; function is used&lt;/span&gt;
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; f2 &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+   x &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; y
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; f1&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;2&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Execute &apos;f1&apos; function &lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;1] 24
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; f2&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;2&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Execute &apos;f2&apos; fucntion&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;1] 20&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;blog-series&quot;&gt;Blog Series&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/articles/introduction-to-r-language/&quot;&gt;&lt;strong&gt;Introduction to R Language&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/hello-r-world/&quot;&gt;&lt;strong&gt;Hello R World&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/r-fundamentals/&quot;&gt;&lt;strong&gt;R Fundamentals&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/read-write-data/&quot;&gt;&lt;strong&gt;Read/Write Data into ‘R’ Language&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/store-data/&quot;&gt;&lt;strong&gt;Store data – Text/Binary Format&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/connections/&quot;&gt;&lt;strong&gt;Manipulate Connections in ‘R’ Language&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/subsetting/&quot;&gt;&lt;strong&gt;Subsetting Data/R Objects&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/control-strcuture/&quot;&gt;&lt;strong&gt;Control Structures&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/functions/&quot;&gt;&lt;strong&gt;Functions in R&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Scoping Rules of R&lt;/li&gt;
&lt;/ul&gt;

    &lt;p&gt;&lt;a href=&quot;/blog/scoping-rules/&quot;&gt;Scoping Rules of R&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on April 06, 2016.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Functions in R]]></title>
  <link rel="alternate" type="text/html" href="/blog/functions/"/>
  <id>/blog/functions</id>
  <published>2016-04-05T19:39:55+00:00</published>
  <updated>2016-04-05T19:39:55+00:00</updated>
  <author>
    <name>Renien Joseph</name>
    <uri></uri>
    <email>renien.john@email.com</email>
  </author>
  <category scheme="/tags/#R%20Language" term="R Language" /><category scheme="/tags/#Data" term="Data" />
  <content type="html">
    &lt;p&gt;We need &lt;em&gt;functions&lt;/em&gt; to avoid the repetitive same few lines of code.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;R functions are treated much like any other R objects.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Like in JavaScript language the functions can be passed as arguments and it can be nested too. The functions are defined using the &lt;strong&gt;&lt;em&gt;function()&lt;/em&gt;&lt;/strong&gt; directive.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; func &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(){&lt;/span&gt;
+   &lt;span class=&quot;c&quot;&gt;## Body of the function [empty]&lt;/span&gt;
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; class&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;func&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Identify the class type&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;1] &lt;span class=&quot;s2&quot;&gt;&quot;function&quot;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; func&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Execute the function&lt;/span&gt;
NULL&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;function arguments&lt;/em&gt;&lt;/strong&gt; are another important option for a function.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; func &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;num&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## pass the num argument&lt;/span&gt;
+   &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;c &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;num&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+     &lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;function reduces the # of code&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+   &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; func&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;1:3&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Execute the function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function &lt;/span&gt;reduces the &lt;span class=&quot;c&quot;&gt;# of code&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function &lt;/span&gt;reduces the &lt;span class=&quot;c&quot;&gt;# of code&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function &lt;/span&gt;reduces the &lt;span class=&quot;c&quot;&gt;# of code&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We can modify our code a bit by setting a &lt;strong&gt;&lt;em&gt;default value&lt;/em&gt;&lt;/strong&gt; for the arguments. Therefore we can call the function without passing any argument value to the interface.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; func&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Execute the function&lt;/span&gt;
Error &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;func&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; : argument &lt;span class=&quot;s2&quot;&gt;&quot;num&quot;&lt;/span&gt; is missing, with no default

&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; func &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;1:2&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## pass the num arguments&lt;/span&gt;
+   &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;c &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;num&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+     &lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;function reduces the # of code&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+   &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; func&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Execute the function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function &lt;/span&gt;reduces the &lt;span class=&quot;c&quot;&gt;# of code&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function &lt;/span&gt;reduces the &lt;span class=&quot;c&quot;&gt;# of code&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;lazy-evaluation&quot;&gt;Lazy Evaluation&lt;/h3&gt;

&lt;p&gt;Function arguments are evaluated &lt;strong&gt;&lt;em&gt;lazily&lt;/em&gt;&lt;/strong&gt;. The below code example clearly explains it.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; add &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;a, b&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Evaluated lazily&lt;/span&gt;
+   &lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;a&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; add&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;3&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Passing one argument value&lt;/span&gt;
3&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;When both arguments are used in the body then R will check for the next argument too.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; add &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;a, b&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Evaluated lazily&lt;/span&gt;
+   &lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;a + b&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Both arguments are used &lt;/span&gt;
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; add&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;3&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Passing one argument value&lt;/span&gt;
Error &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;a + b&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; : argument &lt;span class=&quot;s2&quot;&gt;&quot;b&quot;&lt;/span&gt; is missing, with no default&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;arguments-with-&quot;&gt;Arguments with ‘…’&lt;/h3&gt;

&lt;p&gt;In R we can find a special argument &lt;strong&gt;&lt;em&gt;…&lt;/em&gt;&lt;/strong&gt; , which indicate a number of arguments that are usually passed on to other functions. The &lt;strong&gt;&lt;em&gt;…&lt;/em&gt;&lt;/strong&gt; argument is often used when extending another function.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; plot &lt;span class=&quot;c&quot;&gt;## Execute to see the arguments &lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x, y, ...&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 
UseMethod&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;plot&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&amp;lt;bytecode: 0x10195dad0&amp;gt;
&amp;lt;environment: namespace:graphics&amp;gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; myplot &amp;lt;- &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x, y, &lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;l&apos;&lt;/span&gt;, ...&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
+   plot&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x, y, &lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt;, ...&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## pass &apos;...&apos; to plot function&lt;/span&gt;
+ &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;## Create the x, y points to plot&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;seq&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;0,2&lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;pi,0.01&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;sin&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;# Draw the plot graph [Figure 1]&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; myplot&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;x,y&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure&gt;
  &lt;a href=&quot;/blog/r-blog-series/argument-Rplot.jpeg&quot;&gt;
  &lt;img src=&quot;/blog/r-blog-series/argument-Rplot.jpeg&quot; alt=&quot;image&quot; style=&quot;display: block;
    margin: 0 auto;&quot; /&gt;
  &lt;/a&gt;
  &lt;figcaption&gt;Figure 1: Sample Plot Graph&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;blog-series&quot;&gt;Blog Series&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/articles/introduction-to-r-language/&quot;&gt;&lt;strong&gt;Introduction to R Language&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/hello-r-world/&quot;&gt;&lt;strong&gt;Hello R World&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/r-fundamentals/&quot;&gt;&lt;strong&gt;R Fundamentals&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/read-write-data/&quot;&gt;&lt;strong&gt;Read/Write Data into ‘R’ Language&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/store-data/&quot;&gt;&lt;strong&gt;Store data – Text/Binary Format&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/connections/&quot;&gt;&lt;strong&gt;Manipulate Connections in ‘R’ Language&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/subsetting/&quot;&gt;&lt;strong&gt;Subsetting Data/R Objects&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/control-strcuture/&quot;&gt;&lt;strong&gt;Control Structures&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Functions in R&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/scoping-rules/&quot;&gt;&lt;strong&gt;Scoping Rules of R&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

    &lt;p&gt;&lt;a href=&quot;/blog/functions/&quot;&gt;Functions in R&lt;/a&gt; was originally published by Renien Joseph at &lt;a href=&quot;&quot;&gt;Renien John Joseph&lt;/a&gt; on April 05, 2016.&lt;/p&gt;
  </content>
</entry>

</feed>
