Skip to content

I’m honestly not sure if I’m using Ruby or abusing Ruby. https://t.co/S4YrWjTGzg

CqLvDzkWYAAPEo3

This is great news!

twitter.com/mattzap/status…

/via @garnaat

Season 2 of Puffin Rock is out on Netflix!

netflix.com/title/80044965

After 20 years this Boxster’s exterior in this common color is 😴 but that interior! 😮😎😍💯

newyork.craigslist.org/brk/cto/571793… https://t.co/w2bz863cIs

CqG-z0wW8AAQbR9

I vowed years ago to never write another Web framework… but I think I just did 😱

Maybe it’ll work out this time because it’s micro 🤔🤓🙄

Downloaded @google Duo, instantly deleted it — it wants to “periodically send your contacts to Google.”

HELL no. https://t.co/FlA7qjij5M

Cp_7hgGWEAA68wo

The Terraform team at @hashicorp should really close _just one_ issue: https://t.co/o4L9WFgB95

Cp63_RnWIAAk-l9

I am just epically bad at mornings. Giant blundering disaster. It’s time for me to give up.

@googledocs Drawings does not support lossless export, so despite its excellent usability & collaboration features, it is not recommended.”

Oh wow the upcoming @OmniGraffle 7 looks fantastic!

Includes some huge features I’ve wanted for a long time!

😎💪

omnigroup.com/omnigraffle/pr…

Just posted a photo instagram.com/p/BJAj7ImDJ-D/

Loving the high energy excited vibe of @Werner at #AWSSummit NYC!

I’m at #AWSSummit NYC and @Werner just announced a new load balancer service: ALB, Application Load Balancers, for fine-grained balancing. 😎

Just wrote my first Node.js code with ES6 features. Not bad!

twitter.com/benjancewicz/s…

Dammit, we need to fix *elections* in the US before we can fix anything else!

There are not enough facepalm emojis in the world for a day wherein I need to work with Google APIs.

I’m feeling _really_ good about the toolset we’re using in engineering at @ParkAssist these days! (PT we’re hiring!) https://t.co/36wpJONpR1

CpDCv1YXgAAvyp4

Nothing puts a smile on my face quite like driving a sporty convertible on back roads on a beautiful sunny day.

New post: Code Review Rant

aviflax.com/post/code-revi…

Code Review Rant

I originally posted this rant on our internal message board at Park Assist as a response to one of my coworker’s comments. They were concerned that using “Pull Requests” to facilitate, encourage, and propagate code review as a practice would yield a rigid, costly, contentious, and costly process.

I completely agree with you that trust should be our default, and that we cannot and should not be spending any time nitpicking.

I also agree that we should avoid rigid rules and processes that lack nuance and preclude agency. And about not overthinking up front, and iterating on our work — i.e. accepting that no work is ever going to be perfect, so what matters is shipping it and improving it incrementally and iteratively.

And this is great, because now we’re talking about our values, our principles — which are much more important than and must precede any discussion of our process.

In that light, I’d like to share one of my core principles WRT software development: that code review — when done well — is incredibly valuable, helpful, and useful.

To be clear, by “code review” I don’t mean any specific process, but rather a general practice wherein most or all code is appraised, considered, understood, and discussed by at least 2 people, at some point either before or shortly after shipping. As I wrote above, there are many different ways to implement such a practice.

I believe such a practice is incredibly valuable in that:

  • Flaws are easier and cheaper to correct earlier rather than later
    • “flaws” includes bugs, flat-out errors, sure — but it also includes things like poor readability; poor maintainability; rigidity; brittleness, etc. — at this point in our organization’s development I believe these are now much more important than they once were
  • It disseminates and distributes knowledge
    • If only one person understands a given unit of code, then that’s a single point of failure. When that person leaves the organization, or is on vacaction, or is out sick, or is just really busy — the ability of the organization to change the code is impaired. (I.E. it’s more costly.) And not just the ability to change it, but even the ability to assess potential changes, so as to make good decisions about whether or not, or when, to make changes. (This can also be expressed as a risk factor; the more single points of failure, the higher the risk of problems being unsolved or taking a long time to solve, raising the potential impact of those problems.)
    • If only one person understands a given unit of code, then in general it’ll be most cost-effective for that person to work on it in the future, rather than someone else. This is a compounding effect that leads to serious problems with distribution of labor.
  • It engenders organic, contextual, collaborative discussions on the development of our products and on software development in general
    • This is a big one. These discussions can cover pretty much anything and everything: coding style, system design, patterns, readability, efficiency, philosophy, etc.
    • These discussions are a fantastic way for people to exchange ideas on how to best do this work.
    • This leads to these ideas getting better, and to each person learning more and learning faster and leveling up their skills and experience faster.
    • This also leads to a given team eventually achieving a loose and thoughtful consensus on what they believe about software development, which leads to the systems being more consistent, which has all sorts of third-order benefits.

That excellent article by Bruce Johnson enumerated the benefits similarly:

  • An ounce of prevention is worth a pound of cure
  • A bionic cultural hammer
    • Code reviews promote openness
    • Code reviews raise team standards
    • Code reviews propel teamwork
    • Code reviews keep security top-of-mind
    • Code reviews frame social recognition
  • We shape our culture because it shapes us

If you haven’t had a chance to read it, I highly recommend it.

To paint a picture: imagine getting to sit down for a few minutes every day with a peer who is eager to see what you’ve been up to, excited to learn about it, and may even give some supportive, constructive, gentle suggestions for improving the work. Someone who loves geeking out on this stuff and wants to level up their own skills and just generally do great work and get better together.

I’ve experienced this, and it’s amazing, really fantastic when it’s working well. It takes effort and time to get there, but I believe it’s well worth it.

tags: , , , , ,

Hey, I’ve contributed something (minor) to @apachekafka!

github.com/apache/kafka/c…

I’ve been thinking about migrating my (insignificant) blog to a platform that doesn’t support comments, but just now got a useful comment. 🤔

“US air strike in Syria kills nearly 60 civilians ‘mistaken for Isil fighters’”

telegraph.co.uk/news/2016/07/1…

Fucking horrible. Fuck. Fuck!

Notes from the First Kafka Summit

I learned a lot at the first Kafka Summit, organized by Confluent, and I’d love to try to share some of what I learned. This is a repost from the internal tech blog at Park Assist, and it’s a bit out of date as the conference was in April, but I figure it might have some value to someone stumbling across it.

High-level summary

  • There’s a shit-ton of work going on right now in streaming data, at pretty much every level
  • There are a ton of stream processing (SP) frameworks and libraries being very actively developed:
  • Samza didn’t seem to be present. Curious and worrying. I’ve sent an inquiry to one of its creators.
  • Kafka is working really well for lots of companies
  • But:
    • it still has some rough edges: deployment, operations, management, durability (with default settings)
    • they tend to be larger companies that can invest substantially in infrastructure, tooling, and operations
  • There’s a gap in the market for information, guidance, and services for small and medium companies
  • Heroku is stepping into that gap with a hosted Kafka service.
    • This is very exciting. I’m psyched to try it out.

Session Videos

Confluent has posted videos of every single session — very cool!

Some of my favorites:

And some sessions I missed but plan to watch:

Deployment, Operations, Management, and Tooling; or: Kafka Itself

I didn’t spend much time focused on these topics, but I did learn a few things:

  • larger companies that use Kafka heavily have built lots of tooling around it:
    • schema registries
    • topic registries
    • data dictionaries
    • proxies
    • client libraries
  • lots of companies have made lots of mistakes when using Kafka:
    • large mess of undifferentiated topics
      • no namespacing (e.g. prefixing)
      • no differentiating between public and private topics
    • problematic settings
      • it’s very easy to accidentally lose data
      • the replication defaults are no good (wtf)
  • my conclusions:
    • the cognitive load involved in operating Kafka well is currently high
      • Specifically in the case of a large cluster with robust replication and many topics and many clients
      • I think our use case for on-site installs will be simpler: almost always two machines right next to each other, no AZs, etc
    • Utilizing Kafka and stream processing for a specific app or component or use case is fairly straightforward and as of this moment we have some pretty great options (modern producer, modern consumer, schema registry)
      • But using is as the “nervous system” for an entire company is a whole different thing; you need more documentation, more conventions (e.g. around data evolution), more tooling, etc.

Stream Processing

  • A few different talks covered the principles and models involved in SP
    • there seems to be a converging consensus that we need models, APIs, and DSLs that support processing events based on event time rather than processing time, and for out of order events.
      • Flink (out now), Beam, Spark 2, and Kafka Streams (all coming soon) all support this
    • In addition to aligned windows and sliding windows, with which we’re probably familiar, another window type that was discussed were “session windows” — which could be super useful for our visits.
      • If you squint, then a bay’s visits can be thought of as sessions.
    • The Beam folks are positing that we need models (APIs and DSLs) that have first-class support for early (tentative/speculative) results, on-time results, late results, and corrections.
      • They use something called a watermark to determine when to produce potentially on-time results
      • Each streaming transformation that uses aggregation (which is by necessity windowed) really should include a specification for how to handle “refinements”
      • Kafka Streams seems to support this, albeit maybe in a somewhat rigid way

Kafka Streams

  • SP framework that’s more of a library
  • Combines the Kafka Consumer and Producer along with a sophisticated stream processing DSL to make sophisticated windowing, joining, and aggregate operations accessible
  • Supports event time and unordered events
  • Initial release (as part of Kafka 0.10) is targeted to be some time this summer
  • Vision is to make it radically easier to integrate sophisticated stream processing into apps
  • Because it’s “just a library” you can create SPs and deploy them however you want — deployment is decoupled from use of the library

Beam

  • An open-source fork of the core model behind Google Cloud Dataflow
  • Appear to be aiming for a late summer initial release
  • A core API that’s sort-of a DSL, for expression stream processing operations with a high-level (declarative) syntax that can then be executed by an execution engine (a “runner”)
    • Sort-of like SQL but without the textual language — just a programmatic API for now
  • Includes runners for Cloud Dataflow, Flink, and Spark 1.
  • The Beam Model
    • What » Where » When » How
    • What are you computing
    • Where in event time?
    • When in processing time?
    • How do refinements relate?
  • You specify (declare) those 4 things then the runner interprets them

Spark 2

  • Under active development right now, shooting for release next month
  • Introduces a new unified API that unifies processing of bounded and unbounded data, supports event-time processing, and unordered events
  • Support for Kafka is coming soon, but might not be released at exactly the same time as Spark 2 itself
    • They’re considering unbundling it from the core Spark codebase and releasing it as a plugin

I didn’t actually attend the Flink session, but it was mentioned many times in many different sessions.

Flink is on my radar for a few reasons:

  • It aims to be a comprehensive framework covering both bounded and unbounded data processing (just as Spark 2)
  • It supports event time and unordered events
  • Apache Beam includes a Flink runner that supports almost all of Beam’s semantics
    • Seems to me like this might mean that it might be “easy” to migrate an SP operation from Flink’s API to Beam’s
  • It recently hit 1.0 so it should be more mature than both Kafka Streams and Spark 2

Conclusion

I’m more convinced than ever that Kafka, and the paradigm it embodies, can make our systems radically simpler, faster, more maintainable, and more agile. It’s still early days and it’s going to take time and hard work to realize that potential. Thankfully there’s a large, energized, robust community putting in that time and hard work to make it happen.

tags:

Just discovered a new favorite Dewey Decimal classification: 303.483. And 303.484! What a shelf.