A friendly alternative to the find tool in Linux

fd is a super fast, Rust-based alternative to the Unix/Linux find command. It does not mirror all of find‘s powerful functionality; however, it does provide just enough features to cover 80% of the use cases you might run into. Features like a well thought-out and convenient syntax, colorized output, smart case, regular expressions, and parallel command execution make fd a more than capable successor.

Installation

Head over the fd GitHub page and check out the section on installation. It covers how to install the application on macOS, Debian/Ubuntu, Red Hat, and Arch Linux. Once installed, you can get a complete overview of all available command-line options by running fd -h for concise help, or fd --help for more detailed help.

Simple search

fd is designed to help you easily find files and folders in your operating system’s filesystem. The simplest search you can perform is to run fd with a single argument, that argument being whatever it is that you’re searching for. For example, let’s assume that you want to find a Markdown document that has the word services as part of the filename:

$ fd services
downloads/services.md

If called with just a single argument, fd searches the current directory recursively for any files and/or directories that match your argument. The equivalent search using the built-in find command looks something like this:

$ find . -name ‘services’
downloads/services.md

As you can see, fd is much simpler and requires less typing. Getting more done with less typing is always a win in my book.

Files and folders

You can restrict your search to files or directories by using the -t argument, followed by the letter that represents what you want to search for. For example, to find all files in the current directory that have services in the filename, you would use:

$ fd -tf services
downloads/services.md

And to find all directories in the current directory that have services in the filename:

$ fd -td services
applications/services
library/services

How about listing all documents with the .md extension in the current folder?

$ fd .md
administration/administration.md
development/elixir/elixir_install.md
readme.md
sidebar.md
linux.md

As you can see from the output, fd not only found and listed files from the current folder, but it also found files in subfolders. Pretty neat. You can even search for hidden files using the -H argument:

fd -H sessions .
.bash_sessions

Specifying a directory

If you want to search a specific directory, the name of the directory can be given as a second argument to fd:

$ fd passwd /etc
/etc/default/passwd
/etc/pam.d/passwd
/etc/passwd

In this example, we’re telling fd that we want to search for all instances of the word passwd in the etc directory.

Global searches

What if you know part of the filename but not the folder? Let’s say you downloaded a book on Linux network administration but you have no idea where it was saved. No problem:

fd Administration /
/Users/pmullins/Documents/Books/Linux/Mastering Linux Network Administration.epub

Wrapping up

The fd utility is an excellent replacement for the find command, and I’m sure you’ll find it just as useful as I do. To learn more about the command, simply explore the rather extensive man page.

Getting started with Buildah

Buildah is a command-line tool for building Open Container Initiative-compatible (that means Docker- and Kubernetes-compatible, too) images quickly and easily. It can act as a drop-in replacement for the Docker daemon’s docker build command (i.e., building images with a traditional Dockerfile) but is flexible enough to allow you to build images with whatever tools you prefer to use. Buildah is easy to incorporate into scripts and build pipelines, and best of all, it doesn’t require a running container daemon to build its image.

A drop-in replacement for docker build

You can get started with Buildah immediately, dropping it into place where images are currently built using a Dockerfile and docker build. Buildah’s build-using-dockerfile, or bud argument makes it behave just like docker build does, so it’s easy to incorporate into existing scripts or build pipelines.

As with previous articles I’ve written about Buildah, I like to use the example of installing “GNU Hello” from source. Consider this Dockerfile:

FROM fedora:28
LABEL maintainer Chris Collins

RUN dnf install -y tar gzip gcc make \
        && dnf clean all

ADD http://ftpmirror.gnu.org/hello/hello-2.10.tar.gz /tmp/hello-2.10.tar.gz

RUN tar xvzf /tmp/hello-2.10.tar.gz -C /opt

WORKDIR /opt/hello-2.10

RUN ./configure
RUN make
RUN make install
RUN hello -v
ENTRYPOINT “/usr/local/bin/hello”

Buildah can create an image from this Dockerfile as easily as buildah bud -t hello ., replacing docker build -t hello .:

[chris@krang] $ sudo buildah bud -t hello .
STEP 1: FROM fedora:28
Getting image source signatures
Copying blob sha256:e06fd16225608e5b92ebe226185edb7422c3f581755deadf1312c6b14041fe73
 81.48 MiB / 81.48 MiB [====================================================] 8s
Copying config sha256:30190780b56e33521971b0213810005a69051d720b73154c6e473c1a07ebd609
 2.29 KiB / 2.29 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
STEP 2: LABEL maintainer Chris Collins
STEP 3: RUN dnf install -y tar gzip gcc make    && dnf clean all

<snip>

Once the build is complete, you can see the new image with the buildah images command:

[chris@krang] $ sudo buildah images
IMAGE ID        IMAGE NAME                              CREATED AT              SIZE
30190780b56e    docker.io/library/fedora:28             Mar 7, 2018 16:53       247 MB
6d54bef73e63    docker.io/library/hello:latest    May 3, 2018 15:24     391.8 MB

The new image, tagged hello:latest, can be pushed to a remote image registry or run using CRI-O or other Kubernetes CRI-compatible runtimes, or pushed to a remote registry. If you’re testing it as a replacement for Docker build, you will probably want to copy the image to the docker daemon’s local image storage so it can be run by Docker. This is easily accomplished with the buildah push command:

[chris@krang] $ sudo buildah push hello:latest docker-daemon:hello:latest
Getting image source signatures
Copying blob sha256:72fcdba8cff9f105a61370d930d7f184702eeea634ac986da0105d8422a17028
 247.02 MiB / 247.02 MiB [==================================================] 2s
Copying blob sha256:e567905cf805891b514af250400cc75db3cb47d61219750e0db047c5308bd916
 144.75 MiB / 144.75 MiB [==================================================] 1s
Copying config sha256:6d54bef73e638f2e2dd8b7bf1c4dfa26e7ed1188f1113ee787893e23151ff3ff
 1.59 KiB / 1.59 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures

[chris@krang] $ sudo docker images | head -n2
REPOSITORY              TAG             IMAGE ID        CREATED                 SIZE
docker.io/hello      latest       6d54bef73e63  2 minutes ago   398 MB

[chris@krang] $ sudo docker run -t hello:latest
Hello, world!

A few differences

Unlike Docker build, Buildah doesn’t commit changes to a layer automatically for every instruction in the Dockerfile—it builds everything from top to bottom, every time. On the positive side, this means non-cached builds (for example, those you would do with automation or build pipelines) end up being somewhat faster than their Docker build counterparts, especially if there are a lot of instructions. This is great for getting new changes into production quickly from an automated deployment or continuous delivery standpoint.

Practically speaking, however, the lack of caching may not be quite as useful for image development, where caching layers can save significant time when doing builds over and over again. This applies only to the build-using-dockerfile command, however. When using Buildah native commands, as we’ll see below, you can choose when to commit your changes to disk, allowing for more flexible development.

Buildah native commands

Where Buildah really shines is in its native commands, which you can use to interact with container builds. Rather than using build-using-dockerfile/bud for each build, Buildah has commands to actually interact with the temporary container created during the build process. (Docker uses temporary, or intermediate containers, too, but you don’t really interact with them while the image is being built.)

Using the “GNU Hello” example again, consider this image build using Buildah commands:

#!/usr/bin/env bash

set -o errexit

# Create a container
container=$(buildah from fedora:28)

# Labels are part of the “buildah config” command
buildah config –label maintainer=“Chris Collins <collins.christopher@gmail.com>” $container

# Grab the source code outside of the container
curl -sSL http://ftpmirror.gnu.org/hello/hello-2.10.tar.gz -o hello-2.10.tar.gz

buildah copy $container hello-2.10.tar.gz /tmp/hello-2.10.tar.gz

buildah run $container dnf install -y tar gzip gcc make
Buildah run $container dnf clean all
buildah run $container tar xvzf /tmp/hello-2.10.tar.gz -C /opt

# Workingdir is also a “buildah config” command
buildah config –workingdir /opt/hello-2.10 $container

buildah run $container ./configure
buildah run $container make
buildah run $container make install
buildah run $container hello -v

# Entrypoint, too, is a “buildah config” command
buildah config –entrypoint /usr/local/bin/hello $container

# Finally saves the running container to an image
buildah commit –format docker $container hello:latest

One thing that should be immediately obvious is the fact that this is a Bash script rather than a Dockerfile. Using Buildah’s native commands makes it easy to script, in whatever language or automation context you like to use. This could be a makefile, a Python script, or whatever tools you like to use.

So what is going on here? The first Buildah command container=$(buildah from fedora:28), creates a running container from the fedora:28 image, and stores the container name (the output of the command) as a variable for later use. All the rest of the Buildah commands use the $container variable to say what container to act upon. For the most part those commands are self-explanatory: buildah copy moves a file into the container, buildah run executes a command in the container. It is easy to match them to their Dockerfile equivalents.

The final command, buildah commit, commits the container to an image on disk. When building images with Buildah commands rather than from a Dockerfile, you can use the commit command to decide when to save your changes. In the example above, all of the changes are committed at once, but intermediate commits could be included too, allowing you to choose cache points from which to start. (For example, it would be particularly useful to cache to disk after the dnf install, as that can take a long time, but is also reliably the same each time.)

Mountpoints, install directories, and chroots

Another useful Buildah command opens the door to a lot of flexibility in building images. buildah mount mounts the root directory of a container to a mountpoint on your host. For example:

[chris@krang] $ container=$(sudo buildah from fedora:28)
[chris@krang] $ mountpoint=$(sudo buildah mount ${container})
[chris@krang] $ echo $mountpoint
/var/lib/containers/storage/overlay2/463eda71ec74713d8cebbe41ee07da5f6df41c636f65139a7bd17b24a0e845e3/merged
[chris@krang] $ cat ${mountpoint}/etc/redhat-release
Fedora release 28 (Twenty Eight)
[chris@krang] $ ls ${mountpoint}
bin   dev  home  lib64          media  opt   root  sbin  sys  usr
boot  etc  lib   lost+found  mnt        proc  run   srv   tmp  var

This is great because now you can interact with the mountpoint to make changes to your container image. This allows you to use tools on your host to build and install software, rather than including those tools in the container image itself. For example, in the Bash script above, we needed to install the tar, Gzip, GCC, and make packages to compile “GNU Hello” inside the container. Using a mountpoint, we can build an image with the same software, but the downloaded tarball and tar, Gzip, etc., RPMs are all on the host machine rather than in the container and resulting image:

#!/usr/bin/env bash

set -o errexit

# Create a container
container=$(buildah from fedora:28)
mountpoint=$(buildah mount $container)

buildah config –label maintainer=“Chris Collins <collins.christopher@gmail.com>” $container

curl -sSL http://ftpmirror.gnu.org/hello/hello-2.10.tar.gz \
     -o /tmp/hello-2.10.tar.gz
tar xvzf src/hello-2.10.tar.gz -C ${mountpoint}/opt

pushd ${mountpoint}/opt/hello-2.10
./configure
make
make install DESTDIR=${mountpoint}
popd

chroot $mountpoint bash -c “/usr/local/bin/hello -v”

buildah config –entrypoint “/usr/local/bin/hello” $container
buildah commit –format docker $container hello
buildah unmount $container

Take note of a few things in the script above:

  1. The curl command downloads the tarball to the host, not the image

  2. The tar command (running from the host itself) extracts the source code from the tarball into /opt inside the container.

  3. Configure, make, and make install are all running from a directory inside the mountpoint, mounted to the host rather than running inside the container itself.

  4. The chroot command here is used to change root into the mountpoint itself and test that “hello” is working, similar to the buildah run command used in the previous example.

This script is shorter, it uses tools most Linux folks are already familiar with, and the resulting image is smaller (no tarball, no extra packages, etc). You could even use the package manager for the host system to install software into the container. For example, let’s say you wanted to install NGINX into the container with GNU Hello (for whatever reason):

[chris@krang] $ mountpoint=$(sudo buildah mount ${container})
[chris@krang] $ sudo dnf install nginx –installroot $mountpoint
[chris@krang] $ sudo chroot $mountpoint nginx -v
nginx version: nginx/1.12.1

In the example above, DNF is used with the --installroot flag to install NGINX into the container, which can be verified with chroot.

Try it out!

Buildah is a lightweight and flexible way to create container images without running a full Docker daemon on your host. In addition to offering out-of-the-box support for building from Dockerfiles, Buildah is easy to use with scripts or build tools of your choice and can help build container images using existing tools on the build host. The result is leaner images that use less bandwidth to ship around, require less storage space, and have a smaller surface area for potential attackers. Give it a try!

[See our related story, Creating small containers with Buildah]

Getting started with regular expressions

Regular expressions can be one of the most powerful tools in your toolbox as a Linux user, system administrator, or even as a programmer. It can also be one of the most daunting things to learn, but it doesn’t have to be! While there are an infinite number of ways to write an expression, you don’t have to learn every single switch and flag. In this short how-to, I’ll show you a few simple ways to use regex that will have you running in no time and share some follow-up resources that will make you a regex master if you want to be.

A quick overview

Regular expressions, also referred to as “regex” patterns or even “regular statements,” are in simple terms “a sequence of characters that define a search pattern.” The idea came about in the 1950s when Stephen Cole Kleene wrote a description of an idea he called a “regular language,” of which part came to be known as “Kleene’s theorem.” At a very high level, it says if the elements of the language can be defined, then an expression can be written to match patterns within that language.

Since then, regular expressions have been part of even the earliest Unix programs, including vi, sed, awk, grep, and others. In fact, the word grep is derived from the command that was used in the earliest “ed” editor, namely g/re/p, which essentially means “do a global search for this regular expression and print the lines.” Cool!

Why we need regular expressions

As mentioned above, regular expressions are used to define a pattern to help us match on or “find” objects that match that pattern. Those objects can be files in a filesystem when using the find command for instance, or a block of text in a file which we might search using grep, awk, vi, or sed, for example.

Start with the basics

Let’s start at the very beginning; it’s a very good place to start.

The first regex everyone seems to learn is probably one you already know and didn’t realize what it was. Have you ever wanted to print out a list of files in a directory, but it was too long? Maybe you’ve seen someone type \*.gif to list GIF images in a directory, like:

$ ls *.gif

That’s a regular expression!

When writing regular expressions, certain characters have special meaning to allow us to move beyond matching just characters to matching entire sets of characters. In this case, the * character, also called “star” or “splat,” takes the place of filenames and allows you to match all files ending with .gif.

Search for patterns in a file

The next step in your regex foo training is searching for patterns within a file, especially using the replace pattern to make quick changes.

Two common ways to do this are:

  1. Use vi to open the file, search for a pattern, and make the change (even automatically using replace).
  2. Use the “stream editor,” aka sed, to programmatically search within the file and make the change.

Let’s start by learning some regex by using vi to edit the following file:

The quick brown fox jumped over the lazy dog.
Simple test
Harder test
Extreme test case
ABC 123 abc 567
The dog is lazy

Now, with this file open in vi, let’s look at some regex examples that will help us find some matching strings inside and even replace them automatically.

To make things easier, let’s set vi to ignore case. Type set ic to enable case-insensitive searching.

Now, to start searching in vi, type the / character followed by your search pattern.

Search for things at the beginning or end of a line

To find a line that starts with “Simple,” use this regex pattern:

Notice in the image below that only the line starting with “Simple” is highlighted. The carat symbol (^) is the regex equivalent of “starts with.”

'Simple' highlighted

Next, let’s use the $ symbol, which in regex speak is “ends with.”

'Test' highlighted

See how it highlights both lines that end in “test”? Also, notice that the fourth line has the word test in it, but not at the end, so this line is not highlighted.

This is the power of regular expressions, giving you the ability to quickly look across a great number of matches with ease but specifically drill down on only exact matches.

Test for the frequency of occurrence

To further extend your skills in regular expressions, let’s take a look at some more common special characters that allow us to look for not just matching text, but also patterns of matches.

Frequency matching characters:

Character Meaning Example
* Zero or more ab* – the letter a followed by zero or more b‘s
+ One or more ab+ – the letter a followed by one or more b‘s
? Zero or one ab? – zero or just one b
{n} Given a number, find exactly that number ab{2} – the letter a followed by exactly two b‘s
{n,} Given a number, find at least that number ab{2,} – the letter a followed by at least two b‘s
{n,y} Given two numbers, find a range of that number ab{1,3} – the letter a followed by between one and three b‘s

Find classes of characters

The next step in regex training is to use classes of characters in our pattern matching. What’s important to note here is that these classes can be combined either as a list, such as [a,d,x,z], or as a range, such as [a-z], and that characters are usually case sensitive.

To see this work in vi, we’ll need to turn off the ignore case we set earlier. Let’s type: set noic to turn ignore case off again.

Some common classes of characters that are used as ranges are:

  • a-z – all lowercase characters
  • A-Z – all UPPERCASE characters
  • 0-9 – numbers

Now, let’s try a search similar to one we ran earlier:

Do you notice that it finds nothing? That’s because the previous regex looks for exactly “tT.” If we replace this with:

We’ll see that both the lowercase and UPPERCASE T’s are matched across the document.

Letter 't' highlighted

Now, let’s chain a couple of class ranges together and see what we get. Try:

capital letters and 123 are highlighted

Notice that the capital letters and 123 are highlighted, but not the lowercase letters (including the end of line five).

Flags

The last step in your beginning regex training is to understand flags that exist to search for special types of characters without needing to list them in a range.

  • . – any character
  • \s – whitespace
  • \w – word
  • \d – digit (number)

For example, to find all digits in the example text, use:

Notice in the example below that all of the numbers are highlighted.

numbers are highlighted

To match on the opposite, you usually use the same flag, but in UPPERCASE. For example:

  • \S – not a space
  • \W – not a word
  • \D – not a digit

Notice in the example below that by using \D, all characters EXCEPT the numbers are highlighted.

all characters EXCEPT the numbers are highlighted

Searching with sed

A quick note on sed: It’s a stream editor, which means you don’t interact with a user interface. It takes the stream coming in one side and writes it out the other side.

Using sed is very similar to vi, except that you give it the regex to search and replace, and it returns the output. For example:

sed s/dog/cat/ examples

will return the following to the screen:

Searching and replacing

If you want to save that file, it’s only slightly more tricky. You’ll need to chain a couple of commands together to a) write that file, and b) copy it over the top of the first file.

To do this, try:

sed s/dog/cat/ examples > temp.out; mv temp.out examples

Now, if you look at your examples file, you’ll see that the word “dog” has been replaced.

The quick brown fox jumped over the lazy cat.
Simple test
Harder test
Extreme test case
ABC 123 abc 567
The cat is lazy

For more information

I hope this was a helpful overview of regular expressions. Of course, this is just the tip of the iceberg, and I hope you’ll continue to learn about this powerful tool by reviewing the additional resources below.

Where to get help

For more examples, check out

Choosing the right open source tool for movie project management

One thing artists, engineers, and hackers share in common is their antipathy for management. So, when the time comes when we actually need project management, it comes as a painful growing experience.

For the Lunatics! animated open movie project, we started by using basic tools popular with open source software projects, like a version control system (Subversion), a wiki (MediaWiki), and a bug-tracker and online browser for the source code (Trac). This is viable for a team of a half-dozen people and an unhurried schedule on a volunteer project. But it quickly becomes unmanageable for larger teams and tighter schedules.

Fortunately, there are plenty of open source project management software packages, which can provide structural guidance and hold a lot more information about your project than you can comfortably keep in your own head, freeing you to apply yourself more creatively. The challenge is choosing the right package. And for that, we need to think more carefully about what we want from it.

My previous article dealt with the first and most concrete aspect of this problem: digital asset management. But even more important are the people working on the project and how they apply their time and resources to it, so we have to define what we need for that.

Defining what we need to manage

Breakdown

Project management starts with breaking down the big goal into lots of smaller goals—ideally down to the individual assets needed. This is called the breakdown.

Breakdown can be done by reviewing the script and identifying and listing the elements needed to produce each scene. At its simplest, it can be done in a text editor, but more streamlined solutions can speed things up.

Workflow

Once you have broken down the film into its individual assets, each asset will have to go through key phases of production—for example, a 3D model will need to be designed, modeled, textured, rigged, and animated. Each step might be done by a different person with specialized skills, so the asset will have to move from person to person.

Since asset formats (such as our Blender files) often can’t be merged if two people try to work on a file at once, it’s important to keep track of each asset’s phase and who has control of it. If you mess up and produce two parallel, out-of-sync versions of the file, you’ll probably have to ditch one of them and repeat the other work.

Scheduling and time management

Productions run on timetables. You want to be able to tell people when you will be finished, and you want to finish first things first.

You also may need to identify specific times when you can meet to discuss the project, and—depending on the terms of collaboration—you may need to keep track of the time spent by collaborators on the project.

Until now, we’ve handled most of these tasks through simple text files or LibreOffice Calc spreadsheets, in some cases shared through a MediaWiki site.

Communications

A key problem to solve for a team mediated by the internet is how to maintain context for conversations: you need everyone involved to know what you are talking about.

Much of the time spent on communications involves communicating the context of the conversation—what project, asset, or task are we talking about? We’ve done that using GIMP or Inkscape to produce quick markup images that we share by chat, email, or a phpBB forum.

Things can be done to speed that up. Blender contains its own internal markup system, called Grease Pencil, although it isn’t much faster to use than sketching over a screen capture (although it does work better in 3D, and in fact, it’s so sophisticated people have produced short animated films using it artistically).

We’ve considered using videoconferencing and digital whiteboard package Big Blue Button (on GitHub) for team communications, but it’s probably overkill for our project.

New platform options

To step up from our existing Trac site, we might first consider Trac-like alternatives for managing the project, such as Redmine, which would add several new project management tools, including search, workflow, and scheduling features in addition to handling multiple projects.

We could also look at what other projects are using. Blender Foundation runs a Software-as-a-Service subscription platform for open movies called Blender Cloud. Its core project management software is Attract (see its development site). This is tightly integrated with Blender and provides an API that can be accessed from Blender. It’s definitely an attractive option for a Blender-centered project.

Morevna Project has experimented with dotProject (development on GitHub) in the past and more recently with Open Project.

Urchn.org’s “Tube” project has been using Helga for years, but it is essentially orphaned now (see its development on the Internet Archive).

For business reasons, we are also considering installing an open source enterprise platform called Odoo (previously known as OpenERP), which includes the Odoo Project (with development on GitHub). That would potentially be an easy add for us as well.

Wikipedia offers a comparison of project management packages, of which 31 are open source. Aside from the ones mentioned above a few stand out as interesting.

ProjeQtor and TACTIC are among the most full-featured options on that list.

As mentioned in my previous Opensource.com article about asset management, TACTIC was a competitor for the production-management software used with the Blender Gooseberry Project (producing Cosmos Laundromat) before Blender Foundation decided to create a custom solution.

We chose the TACTIC platform because it is:

  • Designed specifically for animation production
  • Highly flexible in terms of workflow, scheduling, and collaboration features and allows template-based, per-project assignment of workflows and asset types
  • Tightly coupled with the digital asset management system, automatically associating tickets, workflow, schedules, and conversations within the context of each asset
  • Neutral on the choice of creative application (web-based interfaces)
  • Easy to integrate with clients through its web API
  • Written in Python, which is a clearly understandable language we have the skills to work with
  • Quite complete in available project management reports and features

Combined with Odoo for business commerce applications and Mumble for real-time voice communications, our new TACTIC platform should allow us to meet our goals of speeding production and growing our team to manage it.

How the four components of a distributed tracing system work together

Ten years ago, essentially the only people thinking hard about distributed tracing were academics and a handful of large internet companies. Today, it’s turned into table stakes for any organization adopting microservices. The rationale is well-established: microservices fail in surprising and often spectacular ways, and distributed tracing is the best way to describe and diagnose those failures.

That said, if you set out to integrate distributed tracing into your own application, you’ll quickly realize that the term “Distributed Tracing” means different things to different people. Furthermore, the tracing ecosystem is crowded with partially-overlapping projects with similar charters. This article describes the four (potentially) independent components in distributed tracing, and how they fit together.

Distributed tracing: A mental model

Most mental models for tracing descend from Google’s Dapper paper. OpenTracing uses similar nouns and verbs, so we will borrow the terms from that project:

Tracing

  • Trace: The description of a transaction as it moves through a distributed system.
  • Span: A named, timed operation representing a piece of the workflow. Spans accept key:value tags as well as fine-grained, timestamped, structured logs attached to the particular span instance.
  • Span context: Trace information that accompanies the distributed transaction, including when it passes from service to service over the network or through a message bus. The span context contains the trace identifier, span identifier, and any other data that the tracing system needs to propagate to the downstream service.

If you would like to dig into a detailed description of this mental model, please check out the OpenTracing specification.

The four big pieces

From the perspective of an application-layer distributed tracing system, a modern software system looks like the following diagram:

Tracing

The components in a modern software system can be broken down into three categories:

  • Application and business logic: Your code.
  • Widely shared libraries: Other people’s code.
  • Widely shared services: Other people’s infrastructure.

These three components have different requirements and drive the design of the Distributed Tracing systems which is tasked with monitoring the application. The resulting design yields four important pieces:

  • A tracing instrumentation API: What decorates application code.
  • Wire protocol: What gets sent alongside application data in RPC requests.
  • Data protocol: What gets sent asynchronously (out-of-band) to your analysis system.
  • Analysis system: A database and interactive UI for working with the trace data.

To explain this further, we’ll dig into the details which drive this design. If you just want my suggestions, please skip to the four big solutions at the bottom.

Requirements, details, and explanations

Application code, shared libraries, and shared services have notable operational differences, which heavily influence the requirements for instrumenting them.

Instrumenting application code and business logic

In any particular microservice, the bulk of the code written by the microservice developer is the application or business logic. This is the code that defines domain-specific operations; typically, it contains whatever special, unique logic justified the creation of a new microservice in the first place. Almost by definition, this code is usually not shared or otherwise present in more than one service.

That said, you still need to understand it, and that means it needs to be instrumented somehow. Some monitoring and tracing analysis systems auto-instrument code using black-box agents, and others expect explicit “white-box” instrumentation. For the latter, abstract tracing APIs offer many practical advantages for microservice-specific application code:

  • An abstract API allows you to swap in new monitoring tools without re-writing instrumentation code. You may want to change cloud providers, vendors, and monitoring technologies, and a huge pile of non-portable instrumentation code would add meaningful overhead and friction to that procedure.
  • It turns out there are other interesting uses for instrumentation, beyond production monitoring. There are existing projects that use this same tracing instrumentation to power testing tools, distributed debuggers, “chaos engineering” fault injectors, and other meta-applications.
  • But most importantly, what if you wanted to extract an application component into a shared library? That leads us to:

Instrumenting shared libraries

The utility code present in most applications—code that handles network requests, database calls, disk writes, threading, queueing, concurrency management, and so on—is often generic and not specific to any particular application. This code is packaged up into libraries and frameworks which are then installed in many microservices, and deployed into many different environments.

This is the real difference: with shared code, someone else is the user. Most users have different dependencies and operational styles. If you attempt to instrument this shared code, you will note a couple of common issues:

  • You need an API to write instrumentation. However, your library does not know what analysis system is being used. There are many choices, and all the libraries running in the same application cannot make incompatible choices.
  • The task of injecting and extracting span contexts from request headers often falls on RPC libraries, since those packages encapsulate all network-handling code. However, a shared library cannot not know which tracing protocol is being used by each application.
  • Finally, you don’t want to force conflicting dependencies on your user. Most users have different dependencies and operational styles. Even if they use gRPC, will it be the same version of gRPC you are binding to? So any monitoring API your library brings in for tracing must be free of dependencies.

So, an abstract API which (a) has no dependencies, (b) is wire protocol agnostic, and (c) works with popular vendors and analysis systems should be a requirement for instrumenting shared library code.

Instrumenting shared services

Finally, sometimes entire services—or sets of microservices—are general-purpose enough that they are used by many independent applications. These shared services are often hosted and managed by third parties. Examples might be cache servers, message queues, and databases.

It’s important to understand that shared services are essentially “black boxes” from the perspective of application developers. It is not possible to inject your application’s monitoring solution into a shared service. Instead, the hosted service often runs its own monitoring solution.

The four big solutions

So, an abstracted tracing API would help libraries emit data and inject/extract Span Context. A standard wire protocol would help black-box services interconnect, and a standard data format would help separate analysis systems consolidate their data. Let’s have a look at some promising options for solving these problems.

Tracing API: The OpenTracing project

As shown above, in order to instrument application code, a tracing API is required. And in order to extend that instrumentation to shared libraries, where most of the Span Context injection and extraction occurs, the API must be abstracted in certain critical ways.

The OpenTracing project aims to solve this problem for library developers. OpenTracing is a vendor-neutral tracing API which comes with no dependencies, and is quickly gaining support from a large number of monitoring systems. This means that, increasingly, if libraries ship with native OpenTracing instrumentation baked in, tracing will automatically be enabled when a monitoring system connects at application startup.

Personally, as someone who has been writing, shipping, and operating open source software for over a decade, it is profoundly satisfying to work on the OpenTracing project and finally scratch this observability itch.

In addition to the API, the OpenTracing project maintains a growing list of contributed instrumentation, some of which can be found here. If you would like to get involved, either by contributing an instrumentation plugin, natively instrumenting your own OSS libraries, or just want to ask a question, please find us on Gitter and say hi.

Wire Protocol: The trace-context HTTP headers

In order for monitoring systems to interoperate, and to mitigate migration issues when changing from one monitoring system to another, a standard wire protocol is needed for propagating Span Context.

The w3c Distributed Trace Context Community Group is hard at work defining this standard. Currently, the focus is on defining a set of standard HTTP headers. The latest draft of the specification can be found here. If you have questions for this group, the mailing list and Gitter chatroom are great places to go for answers.

Data protocol (Doesn’t exist yet!!)

For black-box services, where it is not possible to install a tracer or otherwise interact with the program, a data protocol is needed to export data from the system.

Work on this data format and protocol is currently at an early stage, and mostly happening within the context of the w3c Distributed Trace Context Working Group. There is particular interest is in defining higher-level concepts, such as RPC calls, database statements, etc, in a standard data schema. This would allow tracing systems to make assumptions about what kind of data would be available. The OpenTracing project is also working on this issue, by starting to define a standard set of tags. The plan is for these two efforts to dovetail with each other.

Note that there is a middle ground available at the moment. For “network appliances” that the application developer operates, but does not want to compile or otherwise perform code modifications to, dynamic linking can help. The primary examples of this are service meshes and proxies, such as Envoy or NGINX. For this situation, an OpenTracing-compliant tracer can be compiled as a shared object, and then dynamically linked into the executable at runtime. This option is currently provided by the C++ OpenTracing API. For Java, an OpenTracing Tracer Resolver is also under development.

These solutions work well for services that support dynamic linking, and are deployed by the application developer. But in the long run, a standard data protocol may solve this problem more broadly.

Analysis system: A service for extracting insights from trace data

Last but not least, there is now a cornucopia of tracing and monitoring solutions. A list of monitoring systems known to be compatible with OpenTracing can be found here, but there are many more options out there. I would encourage you to research your options, and I hope you find the framework provided in this article to be useful when comparing options. In addition to rating monitoring systems based on their operational characteristics (not to mention whether you like the UI and features), make sure you think about the three big pieces above, their relative importance to you, and how the tracing system you are interested in provides a solution to them.

Conclusion

In the end, how important each piece is depends heavily on who you are and what kind of system you are building. For example, open source library authors are very interested in the OpenTracing API, while service developers tend to be more interested in the Trace-Context specification. When someone says one piece is more important than the other, they usually mean “one piece is more important to me than the other.”

However, the reality is this: Distributed Tracing has become a necessity for monitoring modern systems. In designing the building blocks for these systems, the age-old approach—”decouple where you can”—still holds true. Cleanly decoupled components are the best way to maintain flexibility and forwards-compatibility when building a system as cross-cutting as a distributed monitoring system.

Thanks for reading! Hopefully, now when you’re ready to implement tracing in your own application, you have a guide to understanding which pieces they are talking about, and how they fit together.


Want to learn more? Sign up to attend KubeCon EU in May or KubeCon North America in December.

A brief history of bad passwords

IT-mandated password policies seem like a good idea—after all, what are the chances that an attacker will guess your exact passcode out of the 782 million potential combinations in an eight-character string with at least one upper-case letter, one lower-case letter, two numerals, and one symbol? 

Those odds are not in your favor because most IT password policies don’t consider how people select and use passwords in the real world, says Kyle Rankin, chief security officer at Purism and author of Linux Hardening in Hostile Networks. Password polices don’t work because hackers do consider how people think.

Watch Kyle’s Lightning Talk, “Sex, Secret, and God: A Brief History of Bad Passwords,” from the 16th annual Southern California Linux Expo (SCALE) to learn the history of bad passcode policies and what we must do instead to secure our data.

During the UpSCALE Lightning Talks hosted by Opensource.com at the 16th annual Southern California Linux Expo (SCALE) in March 2018, eight presenters shared quick takes on interesting open source topics, projects, and ideas. Watch all of the UpSCALE Lightning Talks on the Opensource.com YouTube channel.

Linux video editing, open source ERP systems, Windows apps, password managers, and more

Our biggest hit last week was Don Watkins’ article on why System76 will start making its Linux computers in the U.S. Here’s more of what readers were talking about the week of April 9-15:

  1. Linux computer maker to move manufacturing to the U.S., by Don Watkins
  2. Top 9 open source ERP systems to consider, by Opensource.com
  3. The current state of Linux video editing 2018, by Seth Kenlon
  4. 3 password managers for the Linux command line, by Scott Nesbitt
  5. 3 open source apps for Windows, by Jeff Macharyas
  6. Getting started with Jenkins Pipelines, by Miguel Suarez
  7. Replicate your custom Linux settings with DistroTweaks, by David Spring
  8. How to create LaTeX documents with Emacs, by Sachin Patil
  9. Git turns 13, Linux and SSH commands to know, Python programming, and more, by Rikki Endsley
  10. Build your first Redis Hello World application in Python, by Tague Griffith

Win a year of AdaBox

AdaBox is a US$ 60 per quarter service that delivers hand-picked Adafruit products, unique collectibles, and exclusive discounts to your door. Enter our giveaway by Sunday, April 29 at 11:59 p.m. Eastern Time for a chance to win.

Free 2017 Open Source Yearbook download

Our third annual open source community yearbook rounds up the top projects, technologies, and stories from 2017.

Call for articles

We want to see your JavaScript story ideas. Send article proposals, along with brief outlines, to rikki@opensource.com.

Stay up on what’s going on with Opensource.com by subscribing to our highlights newsletter.

Check out the editorial calendar for a preview of what’s ahead. Got a story idea? Send us a proposal!

LISA18 CFP now open

The CFP for LISA18 is open, and Brendan Gregg (Netflix) and I will co-chair this year’s event, which will be held Oct 29-31 in downtown Nashville. Do you have something to say about the present and future of Ops? If so, send in your talk proposal by May 24th. Follow LISA on Twitter to stay updated on deadlines and announcements. If you have questions or feedback, contact us at lisa18chairs@usenix.org.

Using less to view text files at the Linux command line

If there’s one thing you’re sure to find on a Linux system, it’s text files. A lot of them. Readme files, configuration files, documents, and more.

Most of the time, you probably open text files using a text editor. But there is a faster and, I think, better way of reading text files. That’s using a utility called less. Standard kit with all Linux distributions (at least the ones I’ve used), less is a command-line textfile viewer with some useful features.

Don’t let the fact that it’s a command-line tool scare you. less is very easy to use and has a very shallow learning curve.

Let’s take a look at some of the things that you can do with less.

Getting started

Crack open a terminal window and navigate to a directory containing one or more text files that you want to view. Then run the command less filename, where filename is the name of the file you want to view.

The file takes over your terminal window, and you’ll notice a colon (:) at the bottom of the window. The colon is where you can type any of the internal commands you use with less. More on these in a moment.

Moving around

Chances are that the text file you’re perusing is more than a couple of lines long; it’s probably a page or more. With less, you can move forward in the file in a few ways:

  • Move down a page by pressing the spacebar or the PgDn key
  • Move down one line at a time by pressing the Down arrow key

less also allows you to move backward in a file. To do that, press the PgUp key (to move up a page at a time) or the Up arrow key (to move up one line at a time).

Finding text

If you have a large text file or are trying to find a specific piece of text, you can do that easily in less. To find a word or phrase, press / on your keyboard and type what you want to find.

Note that the search function in less is case-sensitive. Typing “the silence” isn’t the same as typing “The Silence.”

less also highlights the words or phrases you search for. That’s a nice touch that makes it easier for you to scan the text.

You can press n on your keyboard to find the next instance of the word or phrase. Press p on your keyboard to find the previous instance.

Getting out of there

Once you get to the end of a text file and you’re done viewing it, how do you exit less? That’s easy. Just press q on your keyboard. (You can also press q at any time to leave the program.)

As I mentioned at the beginning of this post, less is easy to use. Once you use it, you’ll wonder how you ever did without it.

How to build a plotter with Arduino

Back in school, there was an HP plotter well hidden in a closet in the science department. I got to play with it for a while and always wanted to have one of my own. Fast forward many, many years. Stepper motors are easily available, I am back into doing stuff with electronics and micro-controllers, and I recently saw someone creating displays with engraved acrylic. This triggered me to finally build my own plotter.

As an old-school 5V guy, I really like the original Arduino Uno. Here’s a list of the other components I used (fyi, I am not affiliated with any of these companies):

  • FabScan shield: Physically hosts the stepper motor drivers.
  • SilentStepSticks: Motor drivers, as the Arduino on its own can’t handle the voltage and current that a stepper motor needs. I am using ones with a Trinamic TMC2130 chip, but in standalone mode for now. Those are replacements for the Pololu 4988, but allow for much quieter operation.
  • SilentStepStick protectors: Diodes that prevent the turning motor from frying your motor drivers (you want them, believe me).
  • Stepper motors: I selected NEMA 17 motors with 12V (e.g., models from Watterott and SparkFun).
  • Linear guide rails
  • Wooden base plate
  • Wood screws
  • GT2 belt  
  • GT2 timing pulley

This is a work in progress that I created as a personal project. If you are looking for a ready-made kit, then check out the MaXYposi from German Make magazine.

Hardware setup

As you can see here, I started out much too large. This plotter can’t comfortably sit on my desk, but it’s okay, as I did it for learning purposes (and, as I have to re-do some things, next time I’ll use smaller beams).

The belt is mounted on both sides of the rail and then slung around the motor with some helper wheels:

I’ve stacked several components on top of the Arduino. The Arduino is on the bottom, above that is the FabScan shield, next is a StepStick protector on motor slots 1+2, and the SilentStepStick is on top. Note that the SCK and SDI pins are not connected.

Be careful to correctly attach the wires to the motor. When in doubt, look at the data sheet or an ohmmeter to figure out which wires belong together.

Software setup

While software like grbl can interpret so-called G-codes for tool movement and other things, and I could have just flashed it to the Arduino, I am curious and wanted to better understand things. (My X-Y plotter software is available at GitHub and comes without any warranty.)

The basics

To drive a stepper motor with the StepStick (or compatible) driver, you basically need to send a high and then a low signal to the respective pin. Or in Arduino terms:


digitalWrite(stepPin, HIGH);
delayMicroseconds(30);
digitalWrite(stepPin, LOW);

Where stepPin is the pin number for the stepper: 3 for motor 1 and 6 for motor 2.

Before the stepper does any work, it must be enabled.

digitalWrite(enPin, LOW);

Actually, the StepStick knows three states for the pin:

  • Low: Motor is enabled
  • High: Motor is disabled
  • Pin not connected: Motor is enabled but goes into an energy-saving mode after a while

When a motor is enabled, its coils are powered and it keeps its position. It is almost impossible to manually turn its axis. This is good for precision purposes, but it also means that both motors and driver chips are “flooded” with power and will warm up.

And last, but not least, we need a way to determine the plotter’s direction:

digitalWrite(dirPin, direction);

The following table lists the functions and the pins

Function Motor1 Motor2
Enable 2 5
Direction 4 7
Step 3 6

Before we can use the pins, we need to set them to OUTPUT mode in the setup() section of the code


pinMode(enPin1, OUTPUT);
pinMode(stepPin1, OUTPUT);
pinMode(dirPin1, OUTPUT);
digitalWrite(enPin1, LOW);

With this knowledge, we can easily get the stepper to move around:


    totalRounds = …
    for (int rounds =0 ; rounds < 2*totalRounds; rounds++) {
       if (dir==0){ // set direction
         digitalWrite(dirPin2, LOW);
       } else {
         digitalWrite(dirPin2, HIGH);
       }
       delay(1); // give motors some breathing time
       dir = 1-dir; // reverse direction
       for (int i=0; i < 6400; i++) {
         int t = abs(3200-i) / 200;
         digitalWrite(stepPin2, HIGH);
         delayMicroseconds(70 + t);
         digitalWrite(stepPin2, LOW);
         delayMicroseconds(70 + t);
       }
    }

This will make the slider move left and right. This code deals with one stepper, but for an X-Y plotter, we have two axes to consider.

Command interpreter

I started to implement a simple command interpreter to use path specifications, such as:

"X30|Y30|X-30 Y-30|X-20|Y-20|X20|Y20|X-40|Y-25|X40 Y25

to describe relative movements in millimeters (1mm equals 80 steps).

The plotter software implements a continuous mode, which allows a PC to feed large paths (in chunks) to the plotter. (This how I plotted the Hilbert curve in this video.)

Building a better pen holder

In the first image above, the pen was tied to the Y-axis with some metal string. This was not precise and also did not enable the software to raise and lower the hand (this explains the big black dots).

I have since created a better, more precise pen holder that uses a servo to raise and lower the pen. This new, improved holder can be seen in this picture and in the Hilbert curve video linked above.

The pen is attached with a little clamp (the one shown is a size 8 clamp typically used to attach cables to walls). The servo arm can raise the pen; when the arm goes down, gravity will lower the pen.

Driving the servo

Driving the servo is relatively straightforward: Just provide the position and the servo does all the work.


#include

// Servo pin
#define servoData PIN_A1

// Positions
#define PEN_UP 10
#define PEN_DOWN 50

Servo penServo;

void setup() {
  // Attach to servo and raise pen
  penServo.attach(servoData);
  penServo.write(PEN_UP);
}

I am using the servo headers on the Motor 4 place of the FabScan shield, so I’ve used analog pin 1.

Lowering the pen is as easy as:

 penServo.write(PEN_DOWN);

Next steps

One of my next steps will be to add some end detectors, but I may skip them and use the StallGuard mode of the TMC2130 instead. Those detectors can also be used to implement a home command.

And perhaps in the future I’ll add a real Z-axis that can hold an engraver to do wood milling, or PCB drilling, or acrylic engraving, or … (a laser comes to mind as well).

This was originally published on the Some Things to Remember blog and is reprinted with permission.

LG re-open sources WebOS, a look at the AI behind the Pixel 2's camera, and more news

In this edition of our open source news roundup, we take a look LG making WebOS open source (again), Google’s camera AI tools, a 3D printed stethoscope, and more.

Open source news roundup for March 18-31, 2018

LG makes WebOS open source… again

What was once open source is open source again. After Korean electronics giant LG bought WebOS from HP, it made the mobile operating system proprietary. The company has done an about-face and has released WebOS Open Source Edition. The goal: to convince developers to adopt WebOS for tablets, set-top boxes, and more.

WebOS, which was originally designed for Palm’s failed line of smartphones, has been powering LG’s televisions since 2013. Since then, “LG has refined the platform significantly, and hopes that the new release will help others exploit it.” To try to get developers on the WebOS bandwagon, the company has made a software development kit and build instructions available.

Google open sources its camera AI tools

Ever wonder how Google’s Pixel 2 smartphones take such impressive portrait-mode photos? Wonder no more. Google has open sourced the artificial intelligence technology behind it.

Called DeepLab 3+, the technology “uses a neural network to detect the outlines of objects in your camera’s field of view.” That enables a camera to gain a greater depth of field and to more accurately identify objects it sees. You can grab the code for DeepLab 3+ from GitHub and learn more about how the technology works.

Researchers create clinically-validated 3D printed stethoscope

Something as simple as a stethoscope can make a huge difference to medical professionals, especially ones in developing countries and in conflict zones. High-quality stethoscopes can be a difficult-to-obtain commodity in those situations, though. Thanks to the work of Dr. Tarek Loubani of the University of Western Ontario, anyone with access to a 3D printer and ABS plastic can 3D print a high-quality stethoscope for less than $3.

Loubani’s stethoscope design, called the Gila, “was made using free open source software to keep costs low and allow others to easily access the code.” Loubani said that the Gila “is the first open source medical device that has been clinically validated and is widely available,” and that “the acoustic quality was the same in our stethoscope as in a premium brand stethoscope.” You can learn more at the Gila Free Medical Hardware GitHub repository.

New open source file indexing software

The laboratory that gave the world the atomic bomb has made its search and retrieval software open source. Los Alamos National Laboratory boasts that its Grand Unified File Index (GUFI for short), released under a BSD license, can perform within seconds queries “that would previously have taken hours or days.”

Gary Grider, who heads the High Performance Computing division at Los Alamos, said that GUFI “will have a big impact on the ability for many levels of users to search data and get a fast response.” That includes making “calculations that support national security, as well as basic scientific research in fields such as engineered materials, biological processes, and earth systems modeling.”

In other news

Thanks, as always, to Opensource.com staff members and moderators for their help this week. Make sure to check out our event calendar to see what’s happening next week in open source.