I am not a competitive person and if you would consider drinking Milo at a pace of a dozen Tetra Paks an hour working well under stress; then, yes, I also work extremely well under stress.

So how exactly did I get myself into a situation where I'm submitting my registration and stress-googling (yes that is a thing) past runs of this challenge and how it kinda went? I too, do not know the answer to that question but I bet deep down it's something along the lines of padding the portfolio.

But nevertheless, I am here to share my musings through my experience at the challenge.

The Entry Challenge

We were given a 3-page document of the challenge we had to solve in either Java, Javascript or Python in order to qualify for the actual challenge. See, when I first saw the document I was literally seeing stars, confused would be an understatement. So after an hour or so of hardcore googling (yes, it's in dictionaries now) I think I've gotten a gist of it.. Let me attempt to simplify.

Input

We were given an input of a comma-separated-value stream in the following format: timestamp,ticker,quantity,price

So an example of the stream would look something like this:

[
  "00:02,AAPL,2,155.10",
  "00:02,META,3,161.15",
  "00:02,META,4,160.38",
  "00:12,AAPL,6,154.80",
  "00:13,AAPL,4,154.22",
  "00:16,META,4,160.30",
  "00:16,AAPL,7,156.26",
  ...
]

Part 1: Aggregate

For the first part of the challenge, we had to aggregate the stream by timestamp in chronological order and output in the following format: timestamp,ticker1,cumulative_quantity1,cumulative_notional1,ticker2,cumulative_quantity2,cumulative_notional2...

The value of the notional property is quantity * price, and cumulative just means.. the sum of those values grouped by the timestamp and ticker.

That might sound pretty confusing and if you're anything like me, examples translate better in my brain. With the example input I showed earlier, the output for this part should look like this:

[
  "00:02,AAPL,2,310.2,META,7,1124.97", 
  "00:12,AAPL,6,928.80",
  "00:13,AAPL,4,616.88", 
  "00:16,AAPL,7,1093.82,META,4,641.2"
]

Okay so! Now that we've got an understanding of the problem and what is required, lets think strategies.

Dictionaries?

At this point my first idea was to use dictionaries and my thought process of using dictionaries would kinda work like this:

Initialise a dictionary that would kinda be structured like so:
```
dict = {
"00:02": {
 "AAPL": [2, 310.2],
 "META": [7, 1124.97],
}
...
}
```
And to create a dictionary like that we would need to iterate through everything in the stream, check for the keys, and then update the values accordingly.
Iterate the keys of the dictionary and join them into the appropriate output list of strings! See, apparently as of Python 3.7, dictionaries are already ordered, so there's no need for further sorting.

A singular sort and loop

For some reason, I've been kind of conditioned to be afraid of loops. The more, the scarier. Why? Because for some reason my head thinks that for every loop I introduce to a solution, the solution would execute for 22 years longer.

Well, I've got the run slower part right but 22 years longer might be an extreme exaggeration. But you get my meaning.

How do we solve this with as little iteration as possible? Well one thing I noticed is that timestamp and ticker was positioned as the first 2 elements of each row of the input.

Intriguing. Because, if we were to sort the list at the start we would get this:

[
  "00:02,AAPL,2,155.10", 
  "00:02,META,3,161.15", 
  "00:02,META,4,160.38", 
  "00:12,AAPL,6,154.80", 
  "00:13,AAPL,4,154.22", 
  "00:16,AAPL,7,156.26", 
  "00:16,META,4,160.30"
]

A-ha! By sorting it first, we can ensure that when we iterate through the sorted list, we can keep track of changes in timestamp and ticker.

For each element in the sorted list we have to

Whenever the timestamp of ticker changes,
- Reset the cum_quantity and cum_notional values to 0, and 0.0 respectively.
- Add on to the result list
If not, add onto the cum_quantity and cum_notional values!

And if you're giggling at cum_quantity like me when working on this solution, you and me? We're the same.

Anyway, talk is cheap here is the code I used to solve this problem

def to_cumulative(stream: list):
    stream.sort()

    result = []
    id = -1
    current_timestamp = ""
    current_ticker = ""
    cum_quantity = 0
    cum_notional = 0.0

    for i, line in enumerate(stream):
        splitted = line.split(",")
        timestamp, ticker, quantity, price = (splitted[0], splitted[1],
                                              int(splitted[2]),
                                              float(splitted[3]))

        if current_timestamp != timestamp:
            if i > 0:
                result[id] += f",{current_ticker},{cum_quantity},{cum_notional:.2f}"
            current_timestamp = timestamp
            current_ticker = ticker
            cum_quantity = 0
            cum_notional = 0.0
            result.append(timestamp)
            id += 1

        if current_ticker != ticker:
            result[id] += f",{current_ticker},{cum_quantity},{cum_notional:.2f}"
            current_ticker = ticker
            cum_quantity = 0
            cum_notional = 0.0

        cum_quantity += quantity
        cum_notional += quantity * price

    result[id] += f",{current_ticker},{cum_quantity},{cum_notional:.2f}"
    return result

If you've noticed, I used some tricks here and there to attempt shortening and simplifying my code. But it should work perfectly fine if you followed the pseudocode/bullet point thing from earlier.

And you thought that was enough for an entry challenge, we now proceed to part two.

Part 2: Chunks

They say a good body of art always has a sequel (nobody says that). So just like Folklore and Evermore, we have part two of the challenge. The sister part, if you will.

Aggregate the stream by time in chronological order, but this time each output record is 'delayed' by only reporting cumulative quantities in blocks of quantity_block, the leftover quantity is not included from the cumulative values.

That was more or less what they gave us in the document and I'm not gonna lie, I was re-reading this part for a good few minutes.

Basically.. for this second part, for a given quantity_block, let's say.. 5, we are supposed to generate the following output:

[
  "00:02,META,5,804.21", 
  "00:12,AAPL,5,774.60", 
  "00:13,AAPL,5,772.84", 
  "00:16,AAPL,5,777.22", 
  "00:16,META,5,801.66"
]

Well, this part isn't that hard after the previous. I re-used the same trick of sorting the input, tracking timestamp and ticker changes and implemented checks where is made sense. Then finally, add to a result list.

Here is the code:

def to_cumulative_delayed(stream: list, quantity_block: int):
    def filter(x):
        splitted = x.split(",")
        return f"{splitted[1]},{splitted[0]}"

    stream.sort(key=filter)

    result = []
    current_ticker = ""
    cum_quantity = 0
    cum_notional = 0.0

    for line in stream:
        splitted = line.split(",")
        timestamp, ticker, quantity, price = (splitted[0], splitted[1],
                                              int(splitted[2]),
                                              float(splitted[3]))

        if current_ticker != ticker:
            current_ticker = ticker
            cum_quantity = 0
            cum_notional = 0.0

        while cum_quantity + quantity > quantity_block:
            merge_quantity = quantity_block - cum_quantity

            quantity -= merge_quantity

            cum_quantity += merge_quantity
            cum_notional += merge_quantity * price

            if cum_quantity % quantity_block == 0:
                result.append(
                    f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}"
                )
                cum_quantity = 0
                cum_notional = 0.0

        cum_quantity += quantity
        cum_notional += quantity * price

        if cum_quantity % quantity_block == 0:
            result.append(
                f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}"
            )
            cum_quantity = 0
            cum_notional = 0.0

    if cum_quantity > 0 and cum_quantity % quantity_block == 0:
        result.append(f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}")

    result.sort()
    return result

If you're sharp-eyed, you would first notice that my initial sort is different this time, I first sort by ticker then by timestamp. Think about it, we need to chunk the quantity in quantity_block for each ticker.

The next peculiar block of code is this:

while cum_quantity + quantity > quantity_block:
            merge_quantity = quantity_block - cum_quantity

            quantity -= merge_quantity

            cum_quantity += merge_quantity
            cum_notional += merge_quantity * price

            if cum_quantity % quantity_block == 0:
                result.append(
                    f"{timestamp},{current_ticker},{cum_quantity},{cum_notional:.2f}"
                )
                cum_quantity = 0
                cum_notional = 0.0

WHAT DOES IT MEAN? - Miley Cyrus

When we are given a quantity_block of 5 and the quantity is 23 what do we do? With a while-loop, we can continually check if the quantity is over the 5 and if so, add it to the results.

Closing

As of publication, I still do not know if my team is accepted into the coding challenge. But in the event that we do, I'll probably share my solutions, thought process, ramblings and stress-complain about the entire experience in the form of blog posts.

Feel free to let me know how I can do some things better and if you have a different solution of your own, do let me know about it! It's always interesting to see how other people solve a challenge.

Resources

The source code of my submission can be found on repl.it!

And if you're wondering, yeah, our team name is untestedcode. No, I'm not explaining it.

The Kaioru Blog