In this, my sixth and probably final article on the Go programming language, I’m running through some smaller features which I didn’t go into detail on in earlier articles. We’ll look at extending and copying slices in more detail, type assertions, error handling, common string manipulations, basic file I/O, embedding files in binaries, and context objects.
This is the 6th of the 6 articles that currently make up the “All Go” series.
In writing the preceding five articles in this series, I feel like I’ve acquired a pretty good understanding of Go as a language, and if you’ve read them then I hope they were useful to you too. There are some aspects of the language I haven’t yet covered, however, and this article is a bit of a miscellany of smaller features I thought could do with covering.
I kick off by looking back at slices, as I think it’s important to know how these work under the hood to use them correctly. Then I’m going to look at type assertions, which are somewhat similar to casting in C/C++. Next up we look at the use of the error
type and the errors
package, which seems quite widely used to encapsulate error return types. After that I’ll briefly review the string manipulations available in the standard library, since this is one of those parts of the standard library that’s always useful in every language.
After that we’ll take a look at reading and writing files, and after that the ways that files and directories can be embedded into executables to be used at runtime. Finally, we’ll take a look at the context
package, which is a standard way to allow functions handling a request to be cancelled if the request takes too long or becomes otherwise invalid, and also to carry other values across API calls.
So let’s jump in and take a deeper dive into how slices work.
I briefly mentioned slices way back in the first article in this series, but they’re such a common mechanism that I think it’s worth considering the nuances.
There are two builtin functions to manipulate slices that you should be familiar with: append()
and copy()
.
append()
¶This is the workhorse of slices. Its purpose is to append values to a slice, but the key aspect is that if this would extend the slice beyond its current backing storage then it allocates new storage and copies the existing elements into it.
Let’s look at some sample code — take a look and see if you can predict what this will print out before you check it below to see if you were right.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
<1> arr=[1 2 3 4 5 6 7 8] slice=[3 4]
<2> arr=[1 2 3 44 5 6 7 8] slice=[3 44]
<3> arr=[1 2 3 44 5 6 7 8] slice=[3 44 5 6]
<4> arr=[1 2 3 44 5 6 77 88] slice=[3 44 5 6 77 88]
<5> arr=[1 2 3 44 5 6 77 88] slice=[3 44 5 6 77 88 99]
<6> arr=[1 2 3 44 5 6 77 88] slice=[3 444 5 6 77 88 99]
Recalling that a slice is only ever a reference to some underlying storage, the sequence of events here is:
slice
starts off being a reference to a 2-element subset of arr
, starting at the third element.slice[1]
which corresponds to arr[3]
, so you see that element updated in both.slice
to cover two additional elements from arr
— if arr
was not big enough, this would panic with a “slice bounds out of range” error.append()
to extend slice
and add two new elements — since these overlap with arr
, those items are also overwritten.append()
again, but this time we’re adding elements beyond the range of arr
so append()
allocates a whole new backing array, copies all the current elements across and then adds the new one, making slice
now point to this new array.slice[1]
but since slice
now points to a separate array, we can see that only the new array is modified and arr
is left unchanged.copy()
¶If you want to do the copying without the allocation, there’s also a copy(dst, src)
builtin function — this copies elements from src
into dst
. The number of elements copied will be the minimum of the two sizes, so this operation will never change the size of the destination slice, and never needs to allocate any memory.
In the first article, we saw how the switch
statement could be used with x.(type)
to switch on the type of x
instead of its value. This is actually a special case of a general syntax for a type assertion.
The type assertion performs a runtime check whether the dynamic type of the underlying value is the same as that asserted. If it can, then the expression evaluates to the value of the target type — if it can’t, the behaviour depends on the variant of the type assertion used.
The general form of the type assertion is where just the new value is returned, as in the example below.
var x interface{} = 123
i := x.(int)
In this example, a simple assignment wouldn’t work because x
has type interface{}
, even though the underlying type of the value is int
. However, with the type assertion x.(int)
this becomes valid.
In this form, where only a single value is used, any type mismatch causes a runtime panic. It’s worth noting that a mismatch when using builtin types like int
will occur if there is any difference — for example, if x
had type int64
above, the type assertion would still panic.
In the specific case of an assignment, a two-value return form can be used to avoid the risk of a panic. Consider the modified example below.
i, ok := x.(int)
In this case, if the underlying type of x
is int
as in the example above, i
is assigned as before and the value of ok
will be true
. However, if the values don’t match then i
will have its default value and ok
will be false
to indicate the type mismatch.
Perhaps a more useful general approach is to use this with interfaces. For example, if you have a value and you want to check whether it provides a specific method, you can use an anonymous interface
specification to do that.
entity, ok := x.(interface{ Render() })
if ok {
x.Render()
} else {
fmt.Printf("No render method")
}
In general in your own code, I would say this sort of practice isn’t ideal. If you need to fall back on this sort of thing it points to a lack of thought in how you’ve structured your interfaces, or an unreasonable feat of refactoring code.
However, there are times when you’re building generalist APIs for other people to use and you might want to, say, support some context associated with a handle — this is the sort of thing you’d use void*
for in C, for example, and in Go you could use interface{}
. In these cases, the calling code knows the type of the underlying value, it’s just been lost by the generic nature of the API through which it’s been passed. My advice is to be aggressive at challenging yourself on cases like this, though, and see if there’s a better way which doesnt’ involve this sort of fragile type casting.
As we’ve already seen in earlier articles, it’s fairly standard practice to use two return values to flag errors — one being the actual return value of the function, the other being an error code to indicate whether the operation was successful. In this section we’ll look in a bit more detail at how errors can be raised and handled.
Although you could use any type you like, there is a builtin error
interface for such values. Sticking to this convention means that your code will be more idiomatic and more easily understood by other experienced Go developers. The only requirement to meet the error
interface is to provide an Error()
method which returns a string representation of the error.
The errors
package provides some useful additional functions, such as a New()
function that accepts a string
and returns a corresponding error
instance. If you want to do some formatting to construct your string, the fmt.Errorf()
function provides a convenient wrapper which is equivalent to using fmt.Sprintf()
then passing the result to error.New()
.
func addOne(arg int) (int, error) {
if (arg < 0) {
return 0, fmt.Errorf("Value %d was negative", arg)
} else {
return arg + 1, nil
}
}
There are some additional nuances when raising errors when handling other errors, which is that an error
value can be used to “wrap” other error
values. This is typically when an error is handled and then re-raised — the outer scope handling the error can add some more useful contextual information to the error. This is particularly valuable when the code raising the error is in some commonly shared function, and the useful information about how to fix it is more likely in the calling scope.
Consider these two functions.
import (
"errors"
"fmt"
)
func sqrtOfInt(x int) (float64, error) {
s := math.Sqrt(float64(x))
if math.IsNaN(s) {
return 0, errors.New("NaN")
} else {
return s, nil
}
}
func intSqrtOfInt(x int) (int, error) {
s, err := sqrtOfInt(x)
if err != nil {
return 0, fmt.Errorf("Error for %d: %w", x, err)
}
return int(s), nil
}
The sqrtOfInt()
function raises an error
using the New
method from the errors
module, which creates a new error
instance with the specific string. When this is handled within intSqrtOfInt()
it’s wrapped in a further error
using the %w
format character of fmt.Errorf()
. This returns an error
instance which also provides an Unwrap()
error which itself returns the wrapped error
instance.
If you call intSqrtOfInt()
with -1
as a parameter, and then you use fmt.Printf()
with %v
to print the returned error, you’ll get the string Error for -1: NaN
. However, you can also access the wrapped error directly by using the errors.Unwrap()
method after importing the errors
module. This will return either the wrapped error
instance if there is one, or nil
otherwise.
The way errors.Unwrap()
works is to check if the passed error provides an Unwrap()
method which returns error
— if it does, it calls it and just passes on the return value.
Wrapped errors can be nested arbitrarily, and the errors
module provides some other functions to deal with this. Firstly, errors.Is()
will check whether the error of any of the wrapped errors match a specified error
instance. By default this checks to see whether any of the wrapped errors compare equal to the instance passed as a parameter to Is()
, which doesn’t sound particularly useful to me — I don’t know how many times is the calling code going to have the specific instance of the error that might be returned for it to compare. However, the other way that Is()
can return a match is if the error
subclass provides an Is()
method of its own which returns bool
to indicate whether a given instance is an error of the same type.
Secondly, errors.As()
is similar to Is()
except that it takes a second parameter target
to use for output. For the first wrapped error which matches the first argument (if any) then that wrapped error is assigned to target
. In this case an error is considered matching if either its type can be successfully assigned to target
, or if the wrapper error implements an As()
method which returns true
when called passing in target
.
A recent update, in version 1.20, added the ability for an error
to wrap multiple errors. In this case the Unwrap()
method can return a []error
slice instead of just error
as mentioned above.
There are two ways to create an error which wraps multiple others:
fmt.Errorf()
with multiple %w
occurrences in the format string, which returns fmt.wrapErrors
(as opposed to fmt.wrapError
when a single error
is wrapped)errors.Join()
passing multiple error
instances, which returns errors.joinError
.Both of these implement a version of Unwrap()
which returns []error
.
Whilst this may be a useful enhancement in some ways, it has some slightly awkward consequences. For example, if you pass an error
wrapping multiple others to errors.Unwrap()
, then it will return nil
in the same way as if there were no wrapped errors. If you want to iterate through them, you’ll need to call Unwrap()
yourself, and to do this safely you’ll need to check the error
does implement a version of Unwrap()
which returns []error
with a type assertion (which we looked at above).
As far as I can tell, to handle all these errors generically you’d need something like this code.
switch err.(type) {
case interface{ Unwrap() []error }:
fmt.Println("Multiple errors wrapped:")
for e, err := range err.(interface{ Unwrap() []error }).Unwrap() {
fmt.Printf(" Error %d: %v\n", e, err)
}
case interface{ Unwrap() error }:
we := errors.Unwrap(err)
fmt.Printf("Unwrapped to: %v\n", we)
default:
fmt.Printf("Error: %v\n", err)
}
No matter how much you attempt to keep your data structured and clean, you always seem to end up having to deal with manipulating strings at some point. Whether it’s joining a list of strings with a delimiter, splitting them back up again, stripping whitespace, or searching for a substring, these are really common operations to need to be able to perform. As a result, I wanted to take a little tour of the facilities the Go standard library offers in this area.
The functions I’m going to look at are provided by the strings
and fmt
packages. I’m certainly not intending to cover these completely — the reference documentation for strings
and fmt
does that far better than I could. It’s more to give you a flavour of the functions that perform the most common operations.
A particularly common case is to have a set of strings that you want to join together into a single string, for example to display it. If you’ve already collected these strings into a slice, the strings.Join()
function has you covered.
strs := []string{"On your marks", "Get set", "Go!"}
singleString := strings.Join(strs, "... ")
Sometimes, however, we need to construct strings as we traverse some path of conditional code sections, and in these cases building up a slice of them only to join them at the end can feel a little cumbersome. In these cases it may be more convenient, and potentially more efficient, to use strings.Builder
— this is a little like a C++ std::ostringstream
or Python’s ‘StringIO’. You treat the object like a writable file, with the results being stored in memory which you can then recover.
var b strings.Builder
for i := 0; i < 5; i++ {
fmt.Fprintf(&b, "Loop %d\n", i)
}
fmt.Println(b.String())
This has the advantage of supporting calls like fmt.Fprinf()
for formatting, and it also supports a WriteString()
method for simply appending a string instead of treating it like a filehandle, which is nice for simpler cases.
It’s essentially just a wrapper around an underlying []byte
which is resized as required by the builtin append()
function that we saw earlier in this article. There’s also a Grow()
method to reserve space for an additional n
bytes — this is implemented by allocating a new, larger buffer and using the builtin copy()
to move existing items there. Therefore, if you do have some idea of the expected size of the resultant string, it’ll be more efficient to call Grow()
before appending anything to avoid later piecemeal resizing.
For splitting one string into many based on a delimiter, there are several options which offer slightly different variants on the functionality.
strings.Split(s, sep string) []string
sep
, as you might expect. As a special case, if sep
is the empty string then it splits up UTF-8 sequences (not bytes). There is no limit on the number of elements it might return.strings.SplitAfter(s, sep string) []string
Split()
except that the trailing delimiter is preserved in each split string, except the last which doesn’t have a trailing delimiter.strings.SplitN(s, sep string, n int) []string
Split()
except the output slice will have at most n
items in it — the final element will be the unsplit remainder of the string. If you pass 0
for n
you’ll always get nil
, and if you pass a negative number this acts as Split()
.strings.SplitAfterN(s, sep string, n int) []string
SplitAfterN()
is to SplitN()
as SplitAfter()
is to Split()
.strings.Cut(s, sep string) (before, after string, found bool)
SplitN()
with 2
passed for n
— i.e. it splits s
on only the first occurrence of sep
. The difference is that if sep
was found it returns the substrings before and after it as separate values instead of a slice, and found
will be returned as true
. If sep
was not found, before
will be the entirety of s
, after
will be empty and found
will be false
.strings.FieldsFunc(s string) []string
Fields()
acts like Split()
but splitting on any character that Unicode deems as whitespace. Another difference is that Fields()
will collapse together sequences of consecutive separators, as opposed to Split()
which will yield empty strings for each zero-width gap between consecutive separators.strings.Fields(s string, f func(rune) bool) []string
Fields()
for when you want the collapsing behaviour, but you want to use some other definition of which characters to split on. This one additionally accepts a function which is passed Unicode code points and is expected to return true
if this is a splitting character and false
otherwise.This is another one that seems to crop up all the time. The strings.TrimSpace()
is your go to function for the common case of stripping whitespace — it removes any leading or trailing characters from the string which Unicode defines as whitespace.
There’s also strings.Trim()
, which removes any of a set of specific characters you specify in a second input string
, or there’s TrimFunc()
which does the same but removes characters until a call to a specified function returns false
. These also have left/right variants (e.g. TrimLeft()
, TrimRightFunc()
) which only strip from the specified end of the string.
There are also TrimPrefix()
and TrimSuffix()
which will remove a specified specific string if it exists at the start or end of the input string respectively — these are analogous to the Python str
methods removeprefix()
and removesuffix()
. The strings
package also offers variants of these CutPrefix()
and CutSuffix()
, where the difference is that they also return a bool
indicating whether anything was actually removed, which saves you calling the separate HasPrefix()
and HasSuffix()
functions.
If you just want to check if a string contains a given subtring, strings.Contains()
returns a simple bool
to this effect. There’s also ContainsAny()
if you want to check for any of the set of characters, and ContainsRune()
if you’re searching for a specific Unicode code point.
If you actually need to know where the substring is, Index()
will return the index (in bytes) of the first occurrence of the specific substring, or -1
if none were found. IndexAny()
will do the same for any of the set of individual characters, IndexByte()
can be used to find a specific byte value ignoring Unicode interpretations, and IndexRune()
will locate a specific code point. There’s also IndexFunc()
which returns the index of the first character for which a specific function returns true
. There are also LastIndex...()
variants of most of these — there is no LastIndexRune()
, however, and I don’t know whether there are fundamental reasons for that or it’s just an oversight.
Finally, if you want to replace those substrings with something else, the Replace()
function will replace the first n
non-overlapping occurrences of a specified string with another specified string. If you pass a negative value of n
then it replaces all occurrences, although there is also a separate ReplaceAll()
function which appears to behave identically — I’m not sure if this is simply because it was felt that the magic negative number behaviour was not very readable, or whether there’s some other subtle difference between these functions that I’m missing.
As we’ve already seen used quite a few times, the functions within fmt
can be used for string formatting — these are heavily based on the printf()
family of functions in the C standard library.
I’m not going to go through the formatting codes because the reference documentation is a much better source of that, but I thought it’s useful to look at the range of functions available.
Firstly, each function is typically available in three variants — let’s use fmt.Print()
as an example:
fmt.Print(arg ...any) (int, error)
fmt.Println(arg ...any) (int, error)
Print()
but always adds a space between arguments, even if they’re strings, and also appends a newline to the end.fmt.Printf(format string, arg ...any) (int, error)
format
string, which follows a series of rules heavily borrowed from C printf()
.Each of the functions below in fmt
has these three variants, and otherwise they just differ in where the resultant string is sent. Here are the options:
fmt.Append(b []byte, …) ][]byte
fmt.Fprint(w io.Writer, …) (int, error)
fmt.Fscan(r io.Reader, …) (int, error)
Fscan()
function assumes space-separated values and treats newlines as whitespace, whereas Fscanf()
expects the input to match the specified format string and newlines in the input must be matched by newlines in the format string. The Fscanln()
function is similar to Fscan()
but stops on the first newline, and also expects a newline or EOF after the final item. All these functions return the number of items they successfully initialised, and if this was less than the number provided then the error return indicates the reason why.fmt.Print(…) (int, error)
fmt.Scan(…) (int, error)
Fscan()
family, but always from standard input instead of a specified file.fmt.Sprint(…) string
fmt.Sscan(str string, …) (int, error)
Fscan()
but uses the specified string as an input instead of an open file.There are also facilties in this package for types to implement their own support both for being serialised out to strings, and also being read back in by functions like fmt.Scan()
— this is a bit involved and beyond the scope of this article, however.
Filesystem manipulations are among the most basic fundamental operations you need to be able to perform in most languages, although perhaps they’re less critical than they once were in the network-first environments in which we deploy software these days. Still, it’s critically important to at least be familiar with the basics.
If you want to consume the entirety of a file into memory, your most basic option is os.ReadFile()
which takes a file path as its sole parameter and returns either a []byte
or an error.
Unless you’re very confident that the size of the file will definitely fit comfortably into memory, however, you’ll often want to perform more fine-grained operations yourself and this is where the os.File
type comes in, which represents an open file descriptor. To open an existing file just call os.Open()
, passing in the filename to open — it returns a *File
and an error
in case of problems.
It’s worth noting there is also os.OpenFile()
which allows other flags to be specified (e.g. O_RDWR
, O_CREATE
, O_EXCL
) and an os.FileMode
value to specify the permissions used to create a file if O_CREATE
was specified as a flag. The documentation suggests most people won’t need this level of control — frankly I’m not so sure, but equally I don’t know whether using this lower-level function might make your code less cross-platform as well. Either way, it’s worth at least being aware that it’s there if you need it.
Once you have your file descriptor, it has the array of methods you’d expect for manipulating it. Some common examples:
(f *File) Read(b []byte) (int, error)
b
, or EOF, whichever comes first. The data is stored into b
and the number of bytes read is returned, along with any error.(f *File) Seek(off int64, whence int) (int64, error)
Read()
(or Write()
) where whence
specifies the interpretation of off
: 0
is relative to the start of the file, 1
is relative to the current offset and 2
is relative to the end of the file.(f *file) ReadAt(b []byte, off int64) (int, error)
Read()
, but starting at offset off
relative to the start of the file. This isn’t affected by, and doesn’t affect, the current read position within the file.(f *file) Stat() (FileInfo, error)
FileInfo
structure defined within the fs
package.These are probably seeming quite familiar to anyone who’s done file IO in a lot of other languages. Of particular note, however, is the fact that you don’t directly specify the maximum number of bytes to read from the file as you would with the underlying read()
system call — instead the size of the destination byte slice is used to determine this. This could be annoying if you want to read a very limited number of bytes for performance reasons, but it does mean that you can’t overflow your target buffer by choosing a size that’s too large.
The facilities for writing to existing files, and creating new ones, are similarly familiar. As with os.ReadFile()
, there is a high-level os.WriteFile()
which will take a []byte
and write it to the specified filename, tuncating it if it already exists or creating it if not. It’s worth noting that this is just a convenience for several other calls, so a hard failure in the midst of these operations can leave the tatget file in an indeterminate state.
For those wanting more control, the os.OpenFile()
mentioned above can be used to open the file for writing, and the O_CREATE
and O_EXCL
flags have their conventional meanings as with the usual C open()
call1.
Alternatively there’s os.Create()
which either creates a new file, or truncates an existing one — this is equivalent to Open()
with O_CREATE
and O_TRUNC
. There’s no way to call this non-destructively if you want to append to an existing file, however — in that case, OpenFile()
with appropriate flags is your only option.
Once you’ve opened your file for reading you can use Write()
and WriteAt()
, which are the direct analogues of Read()
and ReadAt()
mentioned earlier, and Seek()
applies to both reading and writing. There’s also a useful WriteString()
method which takes a string
instead of []byte
.
Truncate()
Gotcha¶There’s also the good old Truncate()
function which sets the size of a file to a specific length. However, it’s worth noting that this function will not affect any current seek within the file, so you can end up with the location of the next Write()
being outside the valid size of the file — depending on your platform and filesystem, this could create sparse files, so be careful.
If you’re not familiar with this, I’ll just give you a quick example. I’m going to run the following small application on a MacOS system.
package main
import "os"
func check(err error) {
if err != nil {
panic(err)
}
}
func main() {
wf, err := os.Create("/tmp/sparse")
check(err)
_, err = wf.WriteString("one two three four five six seven\n")
check(err)
wf.Truncate(10)
wf.WriteString("XXX\n")
}
After running this, let’s take a look at the contents of the file.
$ cat /tmp/sparse
one two thXXX
Well, that seems like what we’d expect — let’s just check the file size to make sure.
$ stat -f "%z" /tmp/sparse
38
That’s odd, it’s saying the file is 38 bytes long — but there’s clearly only 13 characters there. Let’s try passing -t
to cat
to show non-printing characters, in case we’re missing something.
$ cat -t /tmp/sparse
one two th^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXX
Whoa, where did all those nul characters come from? The answer is that after we called Truncate()
, the write pointer was still at the original offset of 34 bytes into the file. So when we call the second WriteString()
, it puts that into hyperspace off the end of the file — as a result, the filesystem has either created a true sparse file, or it’s just padded the content with nuls. We can’t really tell because reading back the gaps between content in a sparse file will yield nuls anyway, but either way, it’s probably not what we expected to happen!
It’s beyond the scope of this article to drill into what’s going on here in more detail, but I just wanted to highlight it because it’s the sort of behaviour that can take people by surprise sometimes, especially as the specific consequences can vary across platforms.
As well as individual files, of course we need facilities to find and remove files.
The os.ReadDir()
function can be used to browse a directory structure, yielding a []DirEntry
where DirEntry
is defined in the fs
package. Each entry has the following methods:
Name()
returns the filename of the entry.IsDir()
returns true
if the entry is a subdirectory.Type()
indicates the basic type (e.g. file, symlink, named pipe).Info()
returns the FileInfo
structure the same as Stat()
.As well as this simple interface, the filepath
package provides a couple of functions which are also useful. It’s worth noting that these also overlap with the fs
package, which appears to be a somewhat abstract interface over abitrary filesystem-like interfaces. I haven’t quite managed to figure out the distinctions between them, but it looks like filepath
is the one to use for standard local filesystem access.
filepath.Glob()
returns a []string
of all the filenames matching a specified glob pattern. Any errors reading directories are ignored, so it can be convenient but might also disguise problems like permissions errors.
The second function is filepath.WalkDir()
, which is quite similar to Python’s os.walk()
. It accepts a root directory for the search and a callback function — this function is invoked for every file and subdirectory contained within the specified root or any subdirectory of it. This function returns an error
which controls how the directory recursion proceeds — special errors can be returned which cause either the current directory, or the remainder of all directories and files, to be skipped. Otherwise, returning any other error aborts the process and raises the returned error from WalkDir()
, and returning nil
continues the recursion as normal.
It’s worth noting there’s also a function filepath.Walk()
which predates WalkDir()
, but is less efficient since it needs to make a separate call to Stat()
on each entry it finds.
The os
module offers the usual functions such as Chmod()
and Chown()
for manipulating file metadata, depending on the platform. os.Mkdir()
creates directories, and os.MkdirAll()
creates all necessary parent directories at the same time.
Link()
creates hard links, Symlink()
creates symbolic ones, and ReadLink()
queries them. All these calls seem quite Unix-focused, so I’ll be interested to see how (and if) they’re implemented on Windows.
os.Remove()
deletes a named file or empty directory, and RemoveAll()
will recursively remove a directory and all of its contents — use with care!
As well as the facilities outlined above, there’s also a bufio
package which provides buffering and some convenience functions for both reading and writing.
It acts as a wrapper around another reader or writer, reading blocks of data and then allowing higher-level parsing functions to process this data in smaller chunks. If you’re going to be doing many small reads from a file, it could be particularly beneficial to use this package for better performance, as performing many small reads directly from the filesystem will increase latency significantly.
It also offers some handy functions for splitting files into lines, words or in other ways which you can customise by providing your own function. I’m not going to drill into this into too much detail, but as a brief example here’s some code that will count the words in files provided stuff on the command-line.
package main
import (
"bufio"
"fmt"
"os"
)
func check(path string, err error) bool {
if err == nil {
return false
}
fmt.Fprintf(os.Stderr, "Failed to read %s: %v\n", path, err)
return true
}
func countWords(path string) {
inputFile, err := os.Open(path)
if check(path, err) {
return
}
scanner := bufio.NewScanner(inputFile)
scanner.Split(bufio.ScanWords)
words := 0
for scanner.Scan() {
words++
}
if check(path, scanner.Err()) {
return
}
fmt.Printf("%s: %d word(s)\n", path, words)
}
func main() {
for _, path := range os.Args[1:] {
countWords(path)
}
}
You can see here that we create a new bufio.Scanner
object which wraps the file, and then the Split()
method is used to indicate what splitting function we want to use. In this case we’re using the builtin ScanWords
function, which splits on whitespace, but there’s also ScanLines
, ScanBytes
and ScanRunes
, which yields complete Unicode code points.
We repeatedly call the Scan()
method to advance to the next token, as defined by the splitting function, and if we wanted the contents of the token we could use the Text()
or Bytes()
methods to retrieve it as a string
or []byte
respectively. In this case, however, we’re just interested in counting words so we don’t need to look at the token, we just increment a counter.
You can also define your own splitting function to use with bufio.Scanner
. Since the reading is done in chunks, the function is called with whatever data is currently buffered, and also a flag indicating whether there’s any more data in the input file beyond that. The function is expected to emit tokens one at a time, along with an indication of the number of bytes it has consumed from the input data to create that token. It can also return an error to halt parsing, and a special ErrFinalToken
error indicates no more tokens will be emitted but the input was valid.
On point that I do note is that the function doesn’t have any kind of shared state that’s passed in for each call, so if you wanted to keep track of additional statistics or the like, in addition to emitting tokens, you’ll need to pass in a closure. It’s also a slight shame that the interface constrains tokens to all be []byte
— it would be nice to build a parser on top of this which does parsing and tokenising at the same time. Still, I suppose the only other option would be either generics, which probably didn’t exist in Go when this was first added, or returning interface{}
, which isn’t the most helpful interface either.
As a point of interest, I ran this over some long text — specifically War and Peace by Leo Tolstoy repeated about 25 times, which is about 80MB of text — to examine its performance. Clearly the code above isn’t particularly optimised, but I was interested to get some idea. It took about 0.92 secs in wallclock time, comprising 0.90 secs user and 0.02 secs system2.
By way of comparison, a very naive Python implementation, basically just using a linewise iteration of the line and len(line.split())
to count words, took 2.44 secs wallclock with 2.26 secs user and 0.05 secs system. We can see that Python is clearly less efficient, using up more time in user space, but the IO performance isn’t too much worse as the system time is still low. Conversely, the wc
utility took just 0.36 secs in wallclock time, with 0.35 secs user and 0.01 secs system — since this is most likely written in C, we can observe there’s still some performance gap. In general, Go is sitting more or less exactly where I would have expected: significantly better than Python, but less fast than well written C code.
One thing that comes up from time to time is to create a temporary file or directory in a platform-independent way, and ideally automatically remove them again when they’re done.
To create a temporary file we can use os.CreateTemp()
, passing in a directory and a prefix — the function will append some random unique string and creates the file, opening it for reading and writing. You can pass an empty string for the directory and the default temporary directory on the platform will be used, as returned by os.TempDir()
. The return value is the same as for Create()
.
To create a temporary directory there’s os.MkdirTemp()
, which takes the same parameters and creates the resultant directory. One difference here, however, is that the name of the directory is returned instead of an open filehandle.
In both cases, however, the developer is responsible for making sure the files are cleaned up at an appropriate time. The easiest way to do this is with something like defer(os.Remove(tmpFile.Name()))
or defer(os.RemoveAll(dirname))
respectively.
One more unusual package I came across whilst perusing the standard library was called embed
, and I thought this was kind of useful. It’s a conceptually simple package that allows you to embed files from the repository into your binary in the form of variables.
Let’s suppose you had a file called somefile.txt
in the same directory as a Go source file, you could embed it by doing this:
import _ "embed"
…
//go:embed somefile.txt
var myFile string
At runtime, the content of the myFile
string will be initialised to the contents of somefile.txt
. You’ll note that we’re importing embed
using the blank _
name — this is because we’re not actually referring to embed
anywhere in the source file (except in a comment) in this example, so if we don’t use the _ "embed"
syntax then we’ll get a warning about an unused import.
The //go:embed
used here is an example of a compiler directive comment — this almost invaribly start with //go:
and it’s important that there’s no spaces or it won’t be recognised. There are a bunch of other such directives about which I know very little, but you can find a brief introduction to some of them in this post by Burak D. on Medium.
Instead of string
we could have made the type []byte
for embedding binary files, with much the same effect.
However, this facility is more flexible than simply pulling in individual files into strings or byte
slices — it can be used to embed a whole series of files, even using wildcards.
Consider this snippet.
//go:embed images templates css
//go:embed html/index.html
var content embed.FS
In this example, images
, templates
and css
are directories, and all files and subdirectories beneath them will also be included. index.html
is a file in the html/
directory. Note that the type of content
is embed.FS
and this causes the files to be made available in a virtual filesystem through the content
variable. This filesystem meets the interface specified by the fs
package, which I mentioned earlier — this seems to be a standard abstract interface that code can use to access filesystems and be made applicable to multiple different backends, such as an embed.FS
.
Just for illustration, code could call content.ReadFile()
passing in a pathname relative to the virtual filesystem root, and the entire content of that file would be returned as a []byte
. This virtual filesystem is entirely read-only, so always safe for multiple threads of code to access concurrently.
This seems like a nifty feature for a number of cases where you want to distribute files with your application, but you’d prefer the convenience of a single binary. A few examples of where this could be useful off the top of my head are:
The context
package provides a Context
object which is used to allow an operation to be cancelled wherever it is across multiple function calls, and also allow request-specific values to be carried across. This is generally intended for servers which handle requests — each request has its own Context
object which is passed into the major API calls made whilst processing the request.
It’s suggested that the context object is passed as the first parameter, typically named ctx
.
func HandleRequest(ctx context.Context, …) error {
…
}
Sometimes these objects may be stored within other objects, although the documentation for the context
package does imply that they should be passed explicitly, not within another structure. For example, the net/http
package embeds a context.Context
inside a http.Request
object.
There are two ways to create a new empty context, which are context.TODO()
and context.Background()
. As far as I can tell these are basically the same, in that they create a default context with nothing set yet, but they signal different intentions to those reading the code — context.TODO()
implies that this is just a placeholder which will eventually be replaced by passing in a context passed in from somewhere else, but which isn’t available yet in the evolution of the code.
Contexts can be used to impose timeouts or allow requests to be otherwise cancelled in way which can apply even as the request passes through API boundaries. This is useful as the request may pass through multiple functions which perform blocking requests, and the cancellation should apply regardless of which one of them is currently working on the request at that moment.
To use the features listed below there are functions in the context
package which return a copy of the passed context except with some modification — this means that multiple features can be set up by calling these functions in succession on the same context.
The functions which are available to modify a context are:
context.WithCancel(parent Context) (Context, CancelFunc)
Done
channel added. This channel will be available to users of the context via the Done()
method and can be used to cancel a request halfway through. The second returned value is a function which, when called with no parameters, will cause the Done
channel will be closed, and request handling functions can check for channel closure to know they should abort handling the request and return ASAP.context.WithCancelCause(parent Context) (Context, CancelCauseFunc)
WithCancel()
except the CancelCauseFunc
that’s returned takes an error
parameter which indicates the reason for the cancellation. The request handling code can recover this error by calling context.Cause()
and passing the context in.context.WithDeadline(parent Context, d time.Time) (Context, CancelFunc)
Done
channel as with WithCancel()
, but also arranges that the cancellation function will be called when the specified deadline expires, or the returned function can be called to cancel the request earlier, as with WithCancel()
.context.WithTimeout(parent Context, timeout time.Duration) (Context, CancelFunc)
WithDeadline()
but with the deadline specified by duration rather than absolute time.Contexts can also be used to pass around request-specific values, such as perhaps a message ID or a validated user. This is a simple key/value store, and code is advised not to use string
or any other builtin type for a key to avoid conflicts with other code that might be using the context — if the key is a package-specific type then there’s no chance of conflicts with other packages. To avoid silly programming errors keys should be constants rather than strings anyway, so making them a unique type should be straightforward.
Storing a value in the context is performed with WithValue()
, which accepts a parent Context
and a key and value, both type any
— as with the functions above, a new Context
is returned which contains the value.
Code can then recover the value by calling the Value()
method and passing in the key — it’s generally a good idea for code to define their own type-specific wrapper for this to avoid mistakes.
Here’s a simple example of how you might go about handling a context which holds two values.
type MessageId int64
// Not exported because the wrapper functions below should be used.
type contextKey int
const (
contextKeyMessageId contextKey = iota
contextKeyUser
)
func NewContext(ctx context.Context, msgId MessageId, user string) context.Context {
ctx = context.WithValue(ctx, contextKeyMessageId, msgId)
return context.WithValue(ctx, contextKeyUser, user)
}
func MessageIdFromContext(ctx context.Context) (MessageId, bool) {
m, ok := ctx.Value(contextKeyMessageId).(MessageId)
return m, ok
}
func UserFromContext(ctx context.Context) (string, bool) {
u, ok := ctx.Value(contextKeyUser).(string)
return u, ok
}
Some closing thoughts on some of the mechanisms we’ve just looked at.
Type assertions still seem a bit quirky to be, both in their syntax and their behaviour. I think the fact they can cause a panic could be one of those ticking timebombs you run into later in the code — everything works now until someone adds a new variant of a type, and “boom”. You could argue that people should always use the two-value return variant, of course, but it’s easy to forget if you’re not an expert. Everyone knows to be careful with guns, but they still (usually) have safety catches.
In terms of error handling, the basic use of error
is simple enough, and it’s useful to somewhat standardise the type of errors in the same way that it’s useful to have all exceptions being derived from a single parent class in languages they use them. However, the use of wrapped errors seems like a bit of a mess at present. Any function which raises these wrapped errors may still list its return type as error
, and that means some awkward use of type assertions and/or switch
statements to handle wrapped errors properly. When only a single error could be wrapped, errors.Unwrap()
provided a convenient interface to doing this, since it hid the type handling as an implementation detail.
It seems to me that the addition of something like errors.UnwrapAllErrors()
would have been easy, which would always return []error
which would be empty if there were no wrapped errors, contain a single item if Unwrap()
returns error
and contains potentially multiple items if Unwrap()
returns []error
. The support for multiple wrapped errors was only added recently, so perhaps additional facilities may be forthcoming. But it’s a little disappointing nonetheless.
That said, I suspect a lot of people will handle all of these things by simply printing them, so these concerns may not apply to many developers.
The string manipulations all seem straightforward enough, and the printf()
semantics are a pleasant blast from the past for me — I’ve always thought that many more modern languages have sacrificed the commendable flexibility and concision of printf()
for the sake of making it easier to stringify user-defined types. I don’t see these as contradictory goals, and Go does a competent job, in my view, of allowing types to define a string form with a String()
method, but still allowing flexible formatting of builtin types.
The file I/O facilities seemed reasonable, although the read size being determined entirely by the size of the destination buffer (assuming a large file) I find a slightly messy abstraction. It’s certainly not the end of the world, mind you. Conversely, the ability to embed content within a binary is quite handy, and the fact that an entire directory tree can be embedded and accessed as such is a neat trick.
So that’s it for this article, and I think I’ve probably covered all I’m going to on Go, unless I come across any particularly interesting problems or discoveries in the future — but I’m well aware I ended the previous article with something similar. This time, however, I’ve honestly run out of ideas for anything I might decide to cover!
I hope this has been interesting and/or useful, and until my next article, have a great day!
Although note that in Go the constant is O_CREATE
whereas the flag defined for read()
is O_CREAT
, missing the final letter. They mean the same thing, though. ↩
As an aside, I made sure to run all the commands I’m using here several times and used the most recent run, so they all should have been run with a hot filesystem cache for a fairer comparison. ↩