Photo by Aarón Blanco Tejedor on Unsplash
Attempting the Entry Challenge for the 2022 CodeIT Suisse Coding Challenge
An extreme mouthful of a title, but I call it a bangin' start
I am not a competitive person and if you would consider drinking Milo at a pace of a dozen Tetra Paks an hour working well under stress; then, yes, I also work extremely well under stress.
So how exactly did I get myself into a situation where I'm submitting my registration and stress-googling (yes that is a thing) past runs of this challenge and how it kinda went? I too, do not know the answer to that question but I bet deep down it's something along the lines of padding the portfolio.
But nevertheless, I am here to share my musings through my experience at the challenge.
The Entry Challenge
We were given a 3-page document of the challenge we had to solve in either Java, Javascript or Python in order to qualify for the actual challenge. See, when I first saw the document I was literally seeing stars, confused would be an understatement. So after an hour or so of hardcore googling (yes, it's in dictionaries now) I think I've gotten a gist of it.. Let me attempt to simplify.
Input
We were given an input of a comma-separated-value stream in the following format:
timestamp,ticker,quantity,price
So an example of the stream would look something like this:
[
"00:02,AAPL,2,155.10",
"00:02,META,3,161.15",
"00:02,META,4,160.38",
"00:12,AAPL,6,154.80",
"00:13,AAPL,4,154.22",
"00:16,META,4,160.30",
"00:16,AAPL,7,156.26",
...
]
Part 1: Aggregate
For the first part of the challenge, we had to aggregate the stream by timestamp in chronological order and output in the following format:
timestamp,ticker1,cumulative_quantity1,cumulative_notional1,ticker2,cumulative_quantity2,cumulative_notional2...
The value of the notional
property is quantity * price
, and cumulative just means.. the sum of those values grouped by the timestamp and ticker.
That might sound pretty confusing and if you're anything like me, examples translate better in my brain. With the example input I showed earlier, the output for this part should look like this:
[
"00:02,AAPL,2,310.2,META,7,1124.97",
"00:12,AAPL,6,928.80",
"00:13,AAPL,4,616.88",
"00:16,AAPL,7,1093.82,META,4,641.2"
]
Okay so! Now that we've got an understanding of the problem and what is required, lets think strategies.
Dictionaries?
At this point my first idea was to use dictionaries and my thought process of using dictionaries would kinda work like this:
Initialise a dictionary that would kinda be structured like so:
dict = { "00:02": { "AAPL": [2, 310.2], "META": [7, 1124.97], } ... }
And to create a dictionary like that we would need to iterate through everything in the stream, check for the keys, and then update the values accordingly.
Iterate the keys of the dictionary and join them into the appropriate output list of strings! See, apparently as of Python 3.7, dictionaries are already ordered, so there's no need for further sorting.
A singular sort and loop
For some reason, I've been kind of conditioned to be afraid of loops. The more, the scarier. Why? Because for some reason my head thinks that for every loop I introduce to a solution, the solution would execute for 22 years longer.
Well, I've got the run slower part right but 22 years longer might be an extreme exaggeration. But you get my meaning.
How do we solve this with as little iteration as possible? Well one thing I noticed is that timestamp
and ticker
was positioned as the first 2 elements of each row of the input.
Intriguing. Because, if we were to sort the list at the start we would get this:
[
"00:02,AAPL,2,155.10",
"00:02,META,3,161.15",
"00:02,META,4,160.38",
"00:12,AAPL,6,154.80",
"00:13,AAPL,4,154.22",
"00:16,AAPL,7,156.26",
"00:16,META,4,160.30"
]
A-ha! By sorting it first, we can ensure that when we iterate through the sorted list, we can keep track of changes in timestamp and ticker.
For each element in the sorted list we have to
- Whenever the timestamp of ticker changes,
- Reset the
cum_quantity
andcum_notional
values to 0, and 0.0 respectively. - Add on to the
result
list
- Reset the
- If not, add onto the
cum_quantity
andcum_notional
values!
And if you're giggling at cum_quantity
like me when working on this solution, you and me? We're the same.
Anyway, talk is cheap here is the code I used to solve this problem
def to_cumulative(stream: list):
stream.sort()
result = []
id = -1
current_timestamp = ""
current_ticker = ""
cum_quantity = 0
cum_notional = 0.0
for i, line in enumerate(stream):
splitted = line.split(",")
timestamp, ticker, quantity, price = (splitted[0], splitted[1],
int(splitted[2]),
float(splitted[3]))
if current_timestamp != timestamp:
if i > 0:
result[id] += f",{current_ticker},{cum_quantity},{cum_notional:.2f}"
current_timestamp = timestamp
current_ticker = ticker
cum_quantity = 0
cum_notional = 0.0
result.append(timestamp)
id += 1
if current_ticker != ticker:
result[id] += f",{current_ticker},{cum_quantity},{cum_notional:.2f}"
current_ticker = ticker
cum_quantity = 0
cum_notional = 0.0
cum_quantity += quantity
cum_notional += quantity * price
result[id] += f",{current_ticker},{cum_quantity},{cum_notional:.2f}"
return result
If you've noticed, I used some tricks here and there to attempt shortening and simplifying my code. But it should work perfectly fine if you followed the pseudocode/bullet point thing from earlier.
And you thought that was enough for an entry challenge, we now proceed to part two.
Part 2: Chunks
They say a good body of art always has a sequel (nobody says that). So just like Folklore and Evermore, we have part two of the challenge. The sister part, if you will.
Aggregate the stream by time in chronological order, but this time each output record is 'delayed' by only reporting cumulative quantities in blocks of
quantity_block
, the leftover quantity is not included from the cumulative values.
That was more or less what they gave us in the document and I'm not gonna lie, I was re-reading this part for a good few minutes.
Basically.. for this second part, for a given quantity_block
, let's say.. 5
, we are supposed to generate the following output:
[
"00:02,META,5,804.21",
"00:12,AAPL,5,774.60",
"00:13,AAPL,5,772.84",
"00:16,AAPL,5,777.22",
"00:16,META,5,801.66"
]
Well, this part isn't that hard after the previous. I re-used the same trick of sorting the input, tracking timestamp
and ticker
changes and implemented checks where is made sense. Then finally, add to a result
list.
Here is the code:
def to_cumulative_delayed(stream: list, quantity_block: int):
def filter(x):
splitted = x.split(",")
return f"{splitted[1]},{splitted[0]}"
stream.sort(key=filter)
result = []
current_ticker = ""
cum_quantity = 0
cum_notional = 0.0
for line in stream:
splitted = line.split(",")
timestamp, ticker, quantity, price = (splitted[0], splitted[1],
int(splitted[2]),
float(splitted[3]))
if current_ticker != ticker:
current_ticker = ticker
cum_quantity = 0
cum_notional = 0.0
while cum_quantity + quantity > quantity_block:
merge_quantity = quantity_block - cum_quantity
quantity -= merge_quantity
cum_quantity += merge_quantity
cum_notional += merge_quantity * price
if cum_quantity % quantity_block == 0:
result.append(
f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}"
)
cum_quantity = 0
cum_notional = 0.0
cum_quantity += quantity
cum_notional += quantity * price
if cum_quantity % quantity_block == 0:
result.append(
f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}"
)
cum_quantity = 0
cum_notional = 0.0
if cum_quantity > 0 and cum_quantity % quantity_block == 0:
result.append(f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}")
result.sort()
return result
If you're sharp-eyed, you would first notice that my initial sort is different this time, I first sort by ticker
then by timestamp
. Think about it, we need to chunk the quantity
in quantity_block
for each ticker
.
The next peculiar block of code is this:
while cum_quantity + quantity > quantity_block:
merge_quantity = quantity_block - cum_quantity
quantity -= merge_quantity
cum_quantity += merge_quantity
cum_notional += merge_quantity * price
if cum_quantity % quantity_block == 0:
result.append(
f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}"
)
cum_quantity = 0
cum_notional = 0.0
WHAT DOES IT MEAN? - Miley Cyrus
When we are given a quantity_block
of 5
and the quantity
is 23
what do we do? With a while-loop, we can continually check if the quantity
is over the 5
and if so, add it to the results.
Closing
As of publication, I still do not know if my team is accepted into the coding challenge. But in the event that we do, I'll probably share my solutions, thought process, ramblings and stress-complain about the entire experience in the form of blog posts.
Feel free to let me know how I can do some things better and if you have a different solution of your own, do let me know about it! It's always interesting to see how other people solve a challenge.
Resources
The source code of my submission can be found on repl.it!
And if you're wondering, yeah, our team name is untestedcode
. No, I'm not explaining it.