A couple of articles ago I wrote about goroutines and how great they are. However, I didn't mention in that article that things can really go bad if you don't use them properly. One of the most important things to keep in mind is that if your goroutines are going to modify the state of any value stored in a memory address, you need to make them thread safe ( keep in mind that goroutines are actually just cheap threads) .
What exactly do I mean by this. Let me explain through an example:
package main
import (
"fmt"
"time"
)
type Country struct {
Name string
Continent string
Population int32
}
func updatePopulation(c *Country, newBorns int32) {
c.Population += newBorns
fmt.Printf("New population of %v is %v\n", c.Name, c.Population)
}
func main() {
totalStates := 50 // total number of sources, (US states)
us := Country{"USA", "North America", 32000000}
for i := 0; i <= totalStates; i++ {
go updatePopulation(&us, int32(i))
}
time.Sleep(time.Second * 5) // this is just so that we don't need channels
}
Here is a very simple example. Say we have an application that is tracking the population of countries in real time. The population of a country is dynamic, babies get born all the time, and sadly some people pass away all the time.
The USA is a huge country, which consists of 50 states. So, for the sake of my example, this updatePopulation
will simulate a single call to some Census Agency of every state. Since we're in a cheerful mood (and also because I need it to prove my point), we'll only register the newborns and not the departed.
Since we want this app to truly be a real-time tracker, we don't want to make synchronous API calls, we want to make them concurrently, hence the go
in front of the updatePopulation
function call.
That should do it, right? Let's see the result:
New population of USA is 32001199
New population of USA is 32001030
New population of USA is 32001206
New population of USA is 32001176
New population of USA is 32000957
New population of USA is 32001005
Hmm.. I gotta say, I'm kind of concerned here. If we are only tracking newborns , and not tracking the deceased... how come our population is actually lower in the last line than the first line?
Well, it's because we haven't implemented thread safety. Our various goroutines are accessing the same memory address without any respect for order. It's like those old people at the supermarket that pretend they don't see the line.
We need to implement some order here.
Enter Mutexes.
Mutex is short for Mutually Exclusive Lock. It's used so that when one thread (or goroutine in the case of Golang) is accessing a value inside a memory address, it can lock out the other threads so they have to wait in line. This guarantees that there will not be any of this random accessing and changing of values. Let's implement:
package main
import (
"fmt"
"sync"
"time"
)
type Country struct {
Name string
Continent string
Population int32
mu sync.Mutex
}
func updatePopulation(c *Country, newBorns int32) {
c.mu.Lock()
defer c.mu.Unlock()
c.Population += newBorns
fmt.Printf("New population of %v is %v\n", c.Name, c.Population)
}
func main() {
totalStates := 50 // total number of sources, (US states)
us := Country{"USA", "North America", 32000000, sync.Mutex{}}
for i := 0; i < totalStates; i++ {
go updatePopulation(&us, int32(i))}
time.Sleep(time.Second * 5)
}
The sync.Mutex
is a struct that we use for implementing mutexes in Go. The default value is an unlocked mutex, as you can see from the standard library code:
// A Mutex is a mutual exclusion lock.
// The zero value for a Mutex is an unlocked mutex.
//
// A Mutex must not be copied after first use.
type Mutex struct {
state int32
sema uint32
}
Now, let's run our app again and see what the result will be:
New population of USA is 32000979
New population of USA is 32001023
New population of USA is 32001072
New population of USA is 32001108
New population of USA is 32001146
New population of USA is 32001185
New population of USA is 32001225
This looks good. The values are constantly incrementing, meaning that order is restored. Now, you might wonder why we use the defer c.mu.Unlock()
code immediately under the line of code where we establish the lock. The reason for this is because we need to avoid a deadlock. Deadlocks are vulnerabilities of mutexes that must be avoided at all cost. Imagine something happens between the locking of a memory address and it's unlocking that causes the goroutine to stop. It would mean that this lock is going to be implemented indefinitely and that all of the other goroutines will not be able to access it at all. This is why we use the defer
keyword, because it will guarantee that no matter what happens in that function, it will unlock after exiting.
I'd also like to mention that beside the standard sync.Mutex
that we used above, there also exists another mutex - sync.RWMutex
.
That's right folks, the complexity is far from finished. Whoever told you that Golang is easy ( that would be me, just a few articles ago) really had no idea about the low-level concepts you need to master before using it responsibly.
The point is that each time a goroutine implements a lock, the other goroutines have to wait in line, thus slowing down the overall performance. But, what if these goroutines just want to read the value from that memory address. That's safe, right? Why would the goroutine that implemented the lock hog the memory address all to itself, if the others promise not to change anything, just read the value and go on with their business.
So if you change the sync.Mutex
field in the Country struct to be sync.RWMutex
, you now have the possibility of getting even more functionality:
1. Lock(): only one go routine reads/writes at a time by acquiring the lock.
2. RLock(): multiple go routines can read(not write) at a time by acquiring the lock.
I actually copy/pasted these two definitions from Stack Overflow (like any well mannered programmer would do) . Here is the original post:
stackoverflow.com/questions/53427824/what-i..
Finally, I want to address the fact that most of you who were coming from Python and JavaScript like me, were probably shocked to even learn that something like a Mutex even exists. The reason we never heard of this concept from our dear dynamically typed, interpreted languages is because in the case of Python (CPython to be specific) the GIL (Global Interpreter Lock) acts a a gigantic Mutex over everything, so using explicit locks wouldn't really change anything. Although, you can implement a Lock and Unlock by using the threading
library, but even the documentation says:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
Basically they are saying - just use the library for IO (but I would suggest using asyncio), and if you really need OS threads, Python probably isn't that great of an option.
For JavaScript devs, the reason is even simpler - you never used mutexes, because JavaScript has only one thread. And, guess what, it does it's job pretty well with it. Once again, if you are doing compute heavy operations that are time sensitive, go with C++ or Rust. Luckily, most modern backend servers don't need compute heavy operations, they are totally IO based, and spend most of their idle time waiting for a db server to return some input. Golang is beautiful because it has the ability to use both OS threads and goroutines.
That's all for this article. If I missed anything please leave a comment. Thanks for reading!