Joe Conley Tagged productivity Random thoughts on technology, business, books, and everything in between jpc2.org/name/productivity Notebook Driven Development <p>Fellow Spark developers, hearken to me! How fast is your Spark development cycle? Slow? Really slow? You could use this <a href="http://www.josephpconley.com/2017/10/12/spark-template.html">super awesome template</a> to enable running your Spark jobs in IntelliJ, but sometimes you’re constrained by the size/locality of the data you’re working with, and you find that each re-run takes time (which is precious and finite and all that so yes, this stuff matters).</p> <p>The craftier of you might turn to that most estimable of tools, the REPL (Read-Evaluate-Process-Loop) for quick command-line iteration. And that’s a good start. I use the Scala REPL on a daily basis, mostly to verify proper date/time formats and regex testing. Using the REPL with Spark, you don’t have the overhead of starting up/shutting down the SparkContext and you can quickly test out things with immediate feedback (cool). And you can enter the REPL from SBT using the <code class="language-plaintext highlighter-rouge">console</code> command, giving you access to the classes/utilities you’ve built in that project and the project dependencies (very cool).</p> <h2 id="a-better-way">A Better Way</h2> <p>So yes, the REPL is nice and all but you can go even FURTHER, FASTER with notebooks like <a href="https://zeppelin.apache.org/">Apache Zeppelin</a>. Zeppelin (like <a href="http://jupyter.org/">Jupyter</a>) allows you to write snippets of runnable code in notebooks and execute them from the browser. What separates Zeppelin from Jupyter is how well it works out of the box with Spark. Spark is the default interpreter for Zeppelin and provides the spark and sql contexts for you implicitly. You also get great visualizations of SQL queries for free.</p> <table class="image"> <caption align="bottom">Simple SQL query using Zeppelin's bank example</caption> <tr><td><img src="/assets/zeppelin-sql.png" alt="Simple SQL query using Zeppelin's bank example" /></td></tr> </table> <p><br /></p> <table class="image"> <caption align="bottom">Simple SQL query with bar graph and form input</caption> <tr><td><img src="/assets/zeppelin-bar.png" alt="Simple SQL query with bar graph" /></td></tr> </table> <p><br /></p> <p>With Zeppelin, if you’re trying to query some dataset and want to understand its total size, the cardinality of a column, or simple descriptive statistics, you can do that immediately from the notebook itself with simple SQL queries. This sounds trivial but it ABSOLUTELY saves you time and effort by giving you a tight feedback loop when asking questions of data and not having to reload it every single time (when you use <code class="language-plaintext highlighter-rouge">cache</code>). In addition, you get documentation for free with Markdown, data visualization support with Angular, a growing ecosystem of modules in the Big Data ecosystem, and simple support for collaboration and sharing among your team.</p> <p>I also think Zeppelin helps you write more scalable Spark code. Writing code in paragraphs reinforces the idea of making methods as small and concise as possible. Once these chunks of code are worked out, building out your codebase is more or less a matter of composing these chunks into logical classes or methods.</p> <p>Zeppelin does have it’s drawbacks. Switching between your actual code and the notebook can be challenging, so you need dedicated contexts of exploration (Zeppelin) vs. crafting a solution (codebase) and stick to them. Also, dependency management is too manual. I would love for Zeppelin to know everything my Spark job knows through some Vulcan mindmeld or something (did I use that term correctly? I’m not a Trekkie. I’m a whatever-you-call-Tolkien-book-lover-two-generations-removed. Ringer? Inkling? Istari?).</p> <h2 id="big-idea-section">Big Idea Section</h2> <p>Ultimately, I think Zeppelin is a great tool if you’re a Spark developer trying to build scalable systems in a reasonable amount of time. I think notebooks are <a href="https://www.youtube.com/watch?v=oHGK96-WixU">“what’s next”</a>. I think speed of development can be a big bottleneck to the software engineering process, especially when working with large volumes of data. I also think, most importantly, that any company of reasonable size needs a certain level of useful, live documentation to understand just what the hell they’re doing.</p> <p>Because knowledge is power right? Isn’t all of this “coding”, “documentation”, and “testing” just different ways to represent knowledge? Ultimately <a href="http://www.lifeissues.net/writers/gro/gro_056heidegger.html">knowledge is just a tool</a>, a means to achieve some goal. It’s incumbent on us as engineers to use the best tools we can to accomplish our goals. I think Zeppelin is one such tool. I also think we could take this idea further and eventually get to the point where all of the code we write is just simple chunks, easily composable with minimal overhead (why do we spend so much time on packaging and deployment?). Or maybe we’re wasting our time and we should let <a href="https://www.oreilly.com/ideas/artificial-intelligence-in-the-software-engineering-workflow">AI do our dirty work</a> for us? Who knows, but for now, I guess we keep on…</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/_h9MxNn8P7w?rel=0" frameborder="0" allowfullscreen=""></iframe> <p><br /></p> Tue, 28 Nov 2017 00:00:00 +0000 jpc2.org/2017/11/28/notebook-drvien-development.html jpc2.org/2017/11/28/notebook-drvien-development.html Shallow vs. Deep Research <p>Are you like me? Do you sometimes get stuck on the merry-go-round of googling for answers to technical questions?</p> <p>Don’t get me wrong. I think sites like Google and StackOverflow are amazing tools and it’d be hard to be productive without them. I think they’re especially useful for trying to conjure up some obscure Linux commands or DDL syntax for one of the dozen databases I work with on a daily basis.</p> <p>But sometimes, I notice that I rely on Google TOO much. Like when I have a problem to solve, I <em>immediately</em> go to Google to see how others have done it. It’s a tempting and albeit understandable trap to fall into. I’m a consultant, and so I’m constantly focused on delivering value to my clients in a quick and effective matter. So it can be difficult to justify reading documentation, digging around in source code, or reading papers on the CAP theorem when there’s a good chance I can find the answer to my question in under 60 seconds via search.</p> <p>In the long run, though, who am I helping by doing this? I’m essentially outsourcing part of my job to someone else. And what’s worse, I’m <em>tricking myself</em> into believing I’ve mastered a certain subject or capability, when in reality I’ve just copied what others have worked hard to figure out.</p> <p>In “How Will You Measure Your Life?”, Clayton Christensen tells the story of Dell and Asus. When Dell first started out, they used Asus to manufacture their chips. As Dell grew, Asus offered to manufacture more and more of the computer until they began manufacturing the entire computer for Dell. In short order, Asus struck out on their own as a low-cost competitor to Dell. Though each step in the outsourcing process looked good from a balance sheet perspective, in the long term this strategy posed a serious threat to Dell.</p> <p>What’s the lesson here? Don’t sacrifice long-term growth and learning for the quick hit of an answer on Google. If you’re stuck in a <strong>really really</strong> time-sensitive situation where you need the quick answer, then leave a TODO for yourself to do a deep dive on the problem in your spare time. Once you have time, use <a href="https://www.farnamstreetblog.com/2012/04/learn-anything-faster-with-the-feynman-technique/">the Feynman technique</a> to deeply understand the problem, and try to question it from all angles. It’s ultimately up to you to decide how far down the rabbit hole to go.</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/M7W2I9FGF9U" frameborder="0" allowfullscreen=""></iframe> <p>It’s easy to get overwhelmed by the daily demands of your job. But it’s also important to keep a mind toward long-term investments in your career, and having a solid foundation of knowledge and the ability to think for yourself is one of the most important investments you can make.</p> <p>P.S. It also helps to block social media and other non-essential distractions, at least during your working hours. The quick hits of social media and quick-answer seeking seem very similar to me, and I suspect one reinforces the other. I try to use a Chrome extension called <a href="https://chrome.google.com/webstore/detail/block-site/eiimnmioipafcokbfikbljfdeojpcgbh">BlockThisSite</a> to that end (though I admit I’m not 100% there yet, new habits take time to form).</p> <p>P.P.S. I think I wrote this more for myself than anyone else, I tend to have a monkey brain and need to get these thoughts down and persisted somewhere as a reminder to focus. Reading the work of Cal Newport has helped a lot though, would highly recommend it!</p> Fri, 21 Jul 2017 00:00:00 +0000 jpc2.org/2017/07/21/shallow-vs-deep-research.html jpc2.org/2017/07/21/shallow-vs-deep-research.html