Concurrency == GolangIn this article, I attempt to analyze (explain) the different concurrency design patterns by solving the Wordcount problem in a MapReduce fashion.
Jayaganesh KalyanasundaramBlockedUnblockFollowFollowingMay 22Developers at Google LLC built Golang in an attempt to solve concurrency issues in a scalable manner across the distributed setup.
Wordcount problem basically demands the count of each word across multiple documents.
This could be directly solved by keeping a counter for each word and incrementing the same on its occurrence.
This solution is however not scalable over millions of files (sending the files’ content from each node to the master is not feasible).
Hence we apply the MapReduce paradigm to solve this.
Here the Map part would be to collect the Wordcount from each file and the Reduce part would be to digest this list across all the files to generate the final Wordcount (shuffle is implicitly handled).
Example:File1:the author is really really awesomeFile2:jg is awesomeFile3:jg is the authorThe result from the mapper:File1:the –> 1author –> 1is –> 1awesome–> 1really –> 2File2:jg –> 1is –> 1awesome–> 1File3:jg –> 1is –> 1the –> 1author –> 1The result from the reducer (Wordcount):the –> 1 + 0 + 1 = 2author –> 1 + 0 + 1 = 2is –> 1 + 1 + 1 = 3awesome–> 1 + 1 + 0 = 2really –> 2 + 0 + 0 = 2jg –> 0 + 1 + 1 = 2We’ll start with the most simple implementation (without exploiting the very nature of Go).
go contains some helper functions to parse the file/directory structures and extract the words.
The obvious way to optimize would be to concurrently compute the Map for each file and reduce at once.
We could run the Mapper for each file parallelly (concurrently) and append the list in concurrent Go routines.
We need a mutex to navigate race conditions in lists appending.
So essentially we have run the Mapper function for multiple files in parallel (assuming multi-cores) but each transaction (appending the result) is secured with a Mutex (since the same memory is operated on by different routines).
This brings us to NO GAIN in run time optimization :(Channels come to our rescue!Channels are a typed conduit through which you can send and receive values.
The send and receive operations are blocked before the other end is ready (hence sync without explicit locks).
So, we could use a channel to send the results computed from the Mapper and process the same through the Reducer at the receiver end.
We need the number of files explicitly to know when to stop receiving.
Now the Mappers operate in concurrence and the results are then and there sent to the Reducer for processing.
This code snippet, however, demands the knowledge of the total number of files for the Reducer to act upon.
This could be resolved by closing the channel once all the send operations are done to the channel and iterating over the range of values in the channel at the receiver’s end.
To monitor the completion at the Sender’s end, WaitGroup can be used.
Observe that the final Reduced result is dumped to a channel.
Clear: WaitGroup is used to ensure all the Mapper results are sent to the channel.
Not Clear: Why is the result obtained from a channel?.Why can’t we use a simple Map (pass by reference)?Reason: Firstly, Maps are read/write sensitive.
it panics when read and write operations take place simultaneously.
Secondly, since the Reducer is run on a separate thread (Go routine), the final(Value) may not be the final version of the map after the entire Reducer processing is done.
Both these issues are solved with a channel (the channel is receiving in main.
go and the final version of the map is sent from reducer.
This is my analysis and explanation of some of the crucial features of Golang and its support for concurrency.
Golang is synonymous to concurrency and is widely used in scaling and distributed systems (even Docker and Kubernetes are written in Go).
Hope people discover the beauty of Golang and contribute to the same!.