Phone Interview

HTTP Header

Link: https://www.1point3acres.com/bbs/thread-777968-1-1.html

Part I

Accept-Language: en-US, fr-CA, fr-FR means that the reader would accept:

English as spoken in the United States (most preferred)
French as spoken in Canada
French as spoken in France (least preferred)

Write a function that receives two arguments: an Accept-Language header value as a string and a set of supported languages, and returns the list of language tags that will work for the request. The language tags should be returned in descending order of preference (the same order as they appeared in the header).

In addition to writing this function, you should use tests to demonstrate that it’s correct, either via an existing testing system or one you create.

Examples:

parse_accept_language(
    "en-US, fr-CA, fr-FR",  # the client's Accept-Language header, a string
     ["fr-FR", "en-US"]      # the server's supported languages, a set of strings
     )
returns: ["en-US", "fr-FR"]

parse_accept_language("fr-CA, fr-FR", ["en-US", "fr-FR"])
returns: ["fr-FR"]

parse_accept_language("en-US", ["en-US", "fr-CA"])
returns: ["en-US"]

def parse_accept_language(s, languages): -> list<str>
    """
    return the list of language tags that will work for the request
    """
    langList = s.split(',')

Wish list

// Step 1)
//
// Imagine an Airbnb-like vacation rental service, where users in different cities can exchange their apartment with
// another user for a week. Each user compiles a wishlist of the apartments they like. These wishlists are ordered,
// so the top apartment on a wishlist is that user's first choice for where they would like to spend a vacation.
// You will be asked to write part of the code that will help an algorithm find pairs of users who would like to
// swap with each other.
//
// Given a set of users, each with an *ordered* wishlist of other users' apartments:
//
// a's wishlist: c d
// b's wishlist: d a c
// c's wishlist: a b
// d's wishlist: c a b
//
// The first user in each wishlist is the user's first-choice for whose apartment they would like to swap into.
// Write a function called has_mutual_first_choice() which takes a username and returns true if that user and
// another user are each other's first choice, and otherwise returns false.
//
// has_mutual_first_choice('a') // true (a and c)
// has_mutual_first_choice('b') // false (b's first choice does not *mutually* consider b as their first choice)
//
// Then expand the base case beyond just "first" choices, to include all "mutually ranked choices". Write another
// function which takes a username and an option called "rank" to indicate the wishlist rank to query on. If given
// a rank of 0, you should check for a first choice pair, as before. If given 1, you should check for a pair of
// users who are each others' second-choice. Call your new function has_mutual_pair_for_rank() and when done,
// refactor has_mutual_first_choice() to depend on your new function.
//
// has_mutual_pair_for_rank('a', 0) // true (a and c)
// has_mutual_pair_for_rank('a', 1) // true (a and d are mutually each others' second-choice)]

# Step 2)
#
# Every wishlist entry in the network is either "mutually ranked" or "not mutually ranked" depending on the rank
# the other user gives that user's apartment in return.
#
# The most common operation in the network is incrementing the rank of a single wishlist entry on a single user.
# This swaps the entry with the entry above it in that user's list. Imagine that, when this occurs, the system must
# recompute the "mutually-ranked-ness" of any pairings that may have changed.
#
# Write a function that takes a username and a rank representing the entry whose rank is being bumped up. Return an
# array of the users whose pairings with the given user *would* gain or lose mutually-ranked status as a result of
# the change, if it were to take place. Call your function changed_pairings()
#
# // if d's second choice becomes their first choice, a and d will no longer be a mutually ranked pair
# changed_pairings('d', 1) // returns ['a']
#
# // if b's third choice becomes their second choice, c and b will become a mutually ranked pair (mutual second-choices)
# changed_pairings('b', 2) // returns ['c']
#
# // if b's second choice becomes their first choice, no mutually-ranked pairings are affected
# changed_pairings('b', 1) // returns []
#
# Step 3)
#
# A user's last choice is the last entry on their wishlist. Their second-to-last choice is second to last on their
# wishlist. This can be continued to define third-to-last choice, and so on, always counting from the end of the
# user's list of apartments.
#
# A mutually-ranked-anti-pairing is one where both parties rank each other's apartments identically near to (or far
# from) the *end* of each of their wishlists.
#
# Implement changed_antipairings(username, rank) to return an array of the users whose pairings with the given user
# either gain or lose mutually-ranked-anti-pairing status as a result of the change. Note that, as before, the
# username and rank passed in identify the entry whose rank is being bumped up, so (a, 1) would refer to a's second-choice.
#
# // if b's third choice becomes their second choice, b and c will no longer be a mutually-ranked‍‌‍‍‍‌‍‌‍‍‌‌‍‍‍‌‍‍ anti-pairing
# changed_antipairings('b', 2) // returns ['c']
#
# // if a's second choice becomes their first choice, a and c will be no longer be a mutually ranked anti-pairing
# // in addition, a and d will become a mutually ranked anti-pairing (the second-to-last choice of each other)
# changed_antipairings('a', 1) // returns ['c', 'd']

Currency conversion

讀取一個string, 實現currency conversion相關的需求
例子: USD:AUD:1.4,CAD:USD:0.8,USD:JPY:110
Part 1: 給定一組currency code, 如果input string裏有直接記錄conversion rate的話, 返回該conversion rate
Part 2: 如果conversion rate可經由一次轉換計算岀來(如CAD:AUD), 也需返回
Part 3: 計算Part 2中的conversion rate時, 需用”best conversion calculation available”
Part 4: 應能返回所有可從input string計算岀的conversion rate pa‍‌‍‍‍‌‍‌‍‍‌‌‍‍‍‌‍‍irs

貨幣兌換是需要雙向的

i was also asked the same question. Make sure to use BFS when they ask you to convert a currency. It will help to solve the follow up questions. And be fast while coding.

Feature

Consider a system of User and Feature objects, where a Feature describes the availability of
an ongoing project or change in functionality. Features may be generally available, limited to
certain regions, and/or intended for an A/B test. We need to write a deterministic system for
identifying which Features are active for a given User. We also want to write tests to ensure
that our system is working as intended each time we run it, given the example User and Feature
data below.  
Users:
[
    { "id": 0, "name": "eva",    "location": "US" },
    { "id": 1, "name": "tess",   "location": "US" },
    { "id": 2, "name": "rahool", "location": "CA" },
    { "id": 3, "name": "amanda", "location": "CA" }
]
Features:
[
    {
        "id": "annual_sale",
        "locations": ["US"],
        "abTest": true,
    },
    {
        "id": "enhanced_comments",
        "abTest": true,
    },
    {
        "id": "canada_promotion",
        "locations": ["CA"],
    }
]
## Part 1
Write a function, `get_user_features(user, features)` which takes a single User object and a list
of Feature objects and returns a set of the Feature IDs that apply to the given User.
A User has three properties: an integer `id`, a string `name`, and a 2-letter country code
string `location`.
A Feature has three properties: a unique string `id`, an optional array of 2-letter country code
strings, `locations`, which limits the feature to users with‍‌‍‍‍‌‍‌‍‍‌‌‍‍‍‌‍‍ a matching location, and an optional
boolean, `abTest`, which when set to true will only apply the feature to users with an even user ID.
If `abTest` or `locations` are absent for a Feature, they have no effect.
Given the features and users, the following results are expected:
| User   | Features                            |
| ------ | ----------------------------------- |
| eva    | annual_sale, enhanced_comments      |
| tess   | N/A                                 |
| rahool | enhanced_comments, canada_promotion |
| amanda | canada_promotion                    |
*/

Fraud Rule

Certain colleagues upstairs have let us know that our policy of approving every Authorization Request is frowned upon by our investors. Luckily, they informed us of a separate system that our data science department built which can detect Authorization Requests that are fraudulent. With fraud, the data scientists say, nothing is set in stone. The criteria for fraud change constantly, so their system develops new Fraud Rules all the time. They’ve given us access to their system’s stream of Fraud Rules, in the following format:
time,field,value
1,merchant,bobs_burgers
20,card_number,4242111111111111
The first Fraud Rule indicates: "from 1 second onward, any Authorization Request with merchant of bobs_burgers is fraudulent."
The second indicates: "from 20 seconds onward, any Authorization Request with card_number of 4242111111111111 is fraudulent."
Once a Fraud Rule is introduced, it is applicable in perpetuity. A new Fraud Rule cannot be applied to a previous request.
Example:
TIME EVENT
*      5 fraud rule A    <-- can apply to R3, R4, and R5
|     10 request R3
|     20 request R4
| *   30 fraud rule B    <-- can only apply to R5
| |   40 request R5
v v * 4‍‌‍‍‍‌‍‌‍‍‌‌‍‍‍‌‍‍5 fraud rule C    <-- does not apply to R3, R4, R5
Integrate Fraud Rule data: REJECT fraudulent Authorization Requests. For the same input as before:
timestamp_seconds,unique_id,amount,card_number,merchant
5,R1,5.60,4242424242424242,bobs_burgers
10,R2,500.00,4242111111111111,a_corp
And the fraud rule input:
time,field,value
1,merchant,bobs_burgers
20,card_number,4242111111111111
The expected report:
5 R1 5.60 REJECT
10 R2 500.00 APPROVE

Server removal time

Throughout this interview, we'll write code to analyze a simple server process uptime log. These logs are much simplified, and are just strings of space separated 0's and 1's. The log is a string of binary digits (e.g. "0 0 1 0"). Each digit corresponds to 1 hour of the server running:
"1" = <crashed>, "down" // server process crashed during the hour
"0" = <didn't crash>, "up" // server process did not crash during the hour
EXAMPLE: A server with log "0 0 1 0" ran for 4 hours and its process crashed during hour #3
   hour: |1|2|3|4|
   log : |0|0|1|0|
              ^
              |
             down during hour #3
We can *permanently remove* a server at the beginning of any hour during its operation. A server is on the network until it is removed. Note that a server stays POWERED ON after removal, it's just not on the network.
We'd like to understand the best times to remove a server. So let's introduce an aggregate metric called a "penalty" for removing a server at a bad time.
EXAMPLE: Remove a server with log "0 0 1 0"
    hour :  | 1 | 2 | 3 | 4 |
    log  :  | 0 | 0 | 1 | 0 |
remove_at:  0   1   2   3   4   // remove_at being `x` means "server removed before hour `x+1`"
            ^               ^
            |               |
     before hour #1         after hour #4
We define our penalty like this:
+1 penalty for each DOWN hour when a server is on the network
+1 penalty for each UP hour after a server has been removed
Further Examples:
EXAMPLE:
   hour :   1 2 3 4     // total penalty = 3  (3 server-up hours after remove)
   log  :   0 0 1 0
           ^
           |
         remove_at = 0
   hour :   1 2 3 4     // total penalty = 1  (1 server-down hour before remove)
   log  :   0 0 1 0
                   ^
                   |
                 remove_at = 4
Note that for a server log of length `n` hours, the remove_at variable can range from 0, meaning "before the first hour" to n, meaning "after the final hour".
1a) Write a function: compute_penalty, that computes the total penalty, given a server log (as a string) AND a time at which we removed the server from the network (call that  variable remove_at). In addition to writing this function, you should use tests to demonstrate that it's correct.
## Examples
compute_penalty("0 0 1 0", 0) should return 3
compute_penalty("0 0 1 0", 4) should return 1
1b) Use your answer for compute_penalty to write another function: find_best_removal_time, that returns the best remove_at hour, given a server log. Again, you should use tests to demonstrate that it's correct.
## Example
find_best_removal_time("0 0 1 1") should return 2
2a) Now that we're able to analyze single server logs, let's analyze some aggregate logs. Aggregate logs are text files that c‍‌‍‍‍‌‍‌‍‍‌‌‍‍‍‌‍‍ontain lots of logs. The files contain only BEGIN, END, 1, 0, spaces and newlines. Aggregate logs include some servers that aren’t actually finished, so we might have some BEGINs scattered throughout. We'll only consider inner BEGINs and ENDs to be valid log sequences. Put another way, any sequence of 0s and 1s surrounded by BEGIN and END forms a valid sequence. For example, the sequence "BEGIN BEGIN BEGIN 1 1 BEGIN 0 0 END 1 1 BEGIN" has only one valid sequence "BEGIN 0 0 END".
Write a function get_best_removal_times, that takes the file's contents as a parameter, and returns an array of best removal hours for every valid server log in that file.
Note: that logs can span 1 or many lines.
Again, you should use tests to demonstrate that your solution is correct.
## Example
get_best_removal_times("BEGIN BEGIN \nBEGIN 1 1 BEGIN 0 0\n END 1 1 BEGIN") should return an array: [2]

Store Penalty

# For the purposes of this interview, imagine that we own a store. This
# store doesn't always have customers shopping: there might be some long
# stretches of time where no customers enter the store. We've asked our
# employees to write simple notes to keep track of when customers are
# shopping and when they aren't by simply writing a single letter every
# hour: 'Y' if there were customers during that hour, 'N' if the store
# was empty during that hour.
# For example, our employee might have written "Y Y N Y", which means
# the store was open for four hours that day, and it had customers
# shopping during every hour but its third one.
#   hour: | 1 | 2 | 3 | 4 |
#   log:  | Y | Y | N | Y |
#                   ^
#                   |
#             No customers during hour 3
# We suspect that we're keeping the store open too long, so we'd like to
# understand when we *should have* closed the store. For simplicity's
# sake, we'll talk about when to close the store by talking about how
# many hours it was open: if our closing time is `2`, that means the
# store would have been open for two hours and then closed.
#   hour:         | 1 | 2 | 3 | 4 |
#   log:          | Y | Y | N | Y |
#   closing_time: 0   1   2   3   4
#                 ^               ^
#                 |               |
#          before hour #1    after hour #4
# (A closing time of 0 means we simply wouldn't have opened the store at
# all that day.)
# First, let's define a "penalty": what we want to know is "how bad
# would it be if we had closed the store at a given hour?" For a given
# log and a given closing time, we compute our penalty like this:
#   +1 penalty for every hour that we're *open* with no customers
#   +1 penalty for every hour that we're *closed* when customers would have shopped
# For example:
#   hour:    | 1 | 2 | 3 | 4 |   penalty = 3:
#   log:     | Y | Y | N | Y |     (three hours with customers after closing)
#   penalty: | * | * |   | * |
#            ^
#            |
#          closing_time = 0
#   hour:    | 1 | 2 | 3 | 4 |   penalty = 2:
#   log:     | N | Y | N | Y |      (one hour without customers while open +
#   penalty: | * |   |   | * |       one hour with customers after closing)
#                    ^
#                    |
#                  closing_time = 2
#   hour:    | 1 | 2 | 3 | 4 |   penalty = 1
#   log:     | Y | Y | N | Y |      (one hour without customers while open)
#   penalty: |   |   | * |   |
#                            ^
#                            |
#                          closing_time = 4
# Note that if we have a log from `n` open hours, the `closing_time`
# variable can range from 0, meaning "never even opened", to n, meaning
# "open the entire time".
# 1)
# Write a function `compute_penalty` that computes the total penalty, given
#   a store log (as a space separated string) AND
#   a closing time (as an integer)
# In addition to writing this function, you should use tests to
# demonstrate that it's correct. Do some simple testing, and then quickly
# describe a few other tests you would write given more time.
# ## Examples
# compute_penalty("Y Y N Y", 0) should return 3
# compute_penalty("N Y N Y", 2) should return 2
# compute_penalty("Y Y N Y", 4) should return 1
# Write another function named find_best_closing_time that returns the best closing time in terms of compute_penalty given just a store log. You should use your answer from pt1 to solve this problem.
# Again, you should use tests to demonstrate that it’s correct. Do some simple testing, and then quickly describe a few other tests you would write given more time.
# # Example
# find_best_closing_time("Y Y N N") should return 2
#   // find_best_closing_time("Y Y Y Y Y")<<endl; //4
#   // find_best_closing_time("N N N N N")<<endl; //0
#   // find_best_closing_time("N N Y Y")<<endl; //4/0
#part 3
# We’ve asked our employees to write their store logs all together in the same text file, marking the beginning of each day with BEGIN and the end of each day as END, sometimes spanning multiple lines. We hoped that the file might look like
# "BEGIN Y Y END \nBEGIN N N END"
# which would represent two days, where the store was open two hours each day. Unfortunately, our employees are often too busy to remember to finish the logs, which means this text file is littered with unfinished days and extra information scattered throughout. Luckily, we know that an unbroken sequence of BEGIN, zero or more Y’s or N’s, and END is going to be a valid log, so we can search the aggregate log for those.
# For example, given the aggregate log:
# "BEGIN BEGIN BEGIN N N BEGIN Y Y END N N END"
#                        ^           ^
#                        |           |
#                        +-----------+
#                          valid log
# In this example, We can extract only one valid log, “BEGIN Y Y END”. For our purposes, we should ignore any invalid log. A valid log cannot contain a nested log. (i.e. Valid logs cannot be nested.) Valid logs can span multiple lines. Also there can be multiple valid logs on a single line.
# Write a function get_best_closing_times that takes an aggregate log as a string and returns an array of best closing times for every valid log we can find, in the order that we find them.
#4
# Over time these aggregate log files can get very large, so write another function, get_best_closing_times__v2. This should work like the previous one, but with two important differences:
# Instead of accepting an aggregate log as a string, it should accept a filename (and not the contents of the file.)
# It should take care not to read the whole file’s contents at once, and instead should stream the file as it goes. Assume that the whole file is too large to fit in memory. (You can, however, safely assume that each individual line of the file can fit in memory.)
# In all other respects, this function should behave like get_b‍‌‍‍‍‌‍‌‍‍‌‌‍‍‍‌‍‍est_closing_times, finding the valid log sequences and returning an array of the best closing times corresponding to each.
# # Example
# assuming `myfile.txt` contains "BEGIN BEGIN \nBEGIN N N BEGIN Y Y\n END N N END"
# get_best_closing_times__v2("myfile.txt")
#   should return an array: [2]

Account Balance

https://github.com/stripe-interview/python-interview-prep

地里常见题。这一轮之前面试官会先给你share一个repo（🔗 github.com），让你本地测试下有没有问题，保证之后integration和bug squash不用额外花时间setup，然而之后的面试并不会用到这个repo

https://leetcode.com/problems/optimal-account-balancing/description/

At Stripe we keep track of where the money is and move money between bank accounts to make sure their balances are not below some threshold. This is for operational and regulatory reasons, e.g. we should have enough funds to pay out to our users, and we are legally required to separate our users’ funds from our own. This interview question is a simplified version of a real-world problem we have here.
Let’s say there are at most 500 bank accounts, some of their balances are above 100 and some are below. How do you move money between them so that they all have at least 100?
Just to be clear we are not looking for the optimal solution, but a working one.
Example input:

AU: 80
US: 140
MX: 110
SG: 120
FR: 70
Output:
from: US, to: AU, amount: 20
from: US, to: FR, amount: 20
from: MX, to: FR, amount: 10
写完还有时间，问了followup：
followup1：反过来问，假设给你一系列transfer，问最后account balance是否满足条件。假设所给account balance无论如何也无法做到每个account>=100，问所给的transfer是不是best effort？
followup2：如何得到最优解？这里你需要问面试官如何定义最优解。面试官说转账次数越少越好。这样和蠡口撕柳芜就很像了
followup都只用说思路，不用写

Onsite

Coding

integration

bikemap

Debug

mako

filesystem
comment, ast
exception type
template match

Debug

也是新题。用的是 https://github.com/jbeder/yaml-cpp 。一共2题，据面试官说都是这个repo里实际出现过的bug

parse tag出了问题：yaml tag允许包含括号，但是他给的这个版本有bug，一碰到括号就误以为tag结束了

Debug

https://github.com/square/moshi

System Design

System Design - ledger

Stripe is a payments platform that provides the ability for online businesses to charge money from customers and then get paid out periodically.
For example a Stripe merchant called ShirtyPuff runs a website that sells t-shirts. Every time one of ShirtyPuff’s customers buys a t-shirt we collect money on behalf of the merchant. Periodically we pay the merchant an amount which is calculated by aggregating all the transactions.
There are other teams that take care of building software for actually sending and receiving money.
Your aim is to build a bookkeeping service (called Ledger) that keeps track of money sent and money received on behalf of a merchant. The purpose of this service is to record all financial activity and allow getting the account balance for a given merchant.
The Ledger should support the following operations:

Record money sent or received on behalf of a merchant
Get account balance for a given merchant
What APIs would you build for a system like this?
How would you go about building this system?

https://tianpan.co/notes/166-designing-payment-webhook

System Design - metrics counter

Imagine you work at a startup with many copies of many different services
running in production. You want to help your engineers keep track of metrics
like number of signups, number of requests, number of errors, etc. So you
introduce the concept of "counters". You build a `Track` library that ships
with every service, and anyone can call `Track.increment("some-metric")`
in their code to increment the `some-metric` counter.
This is really useful for quickly spotting anomalies. If you graph the number
of signups over time and it suddenly goes to zero, then we can assume that
something is not going well. This is mostly an operational tool (for example,
we’re not keeping track of precise dollar amounts for accounting or request
counts for rate limiting). Imagine we want to see the number of signups per
minute for the last 6 hours, or how many fraudulent cards were detected hourly
for the last day, or trends in a feature's adoption over the last few weeks.
‍‌‍‍‍‌‍‌‍‍‌‌‍‍‍‌‍‍
Starting from just that interface, design everything that happens after
`Track.increment("some-metric")` gets called on a service, from what happens
inside the service to how the data is sent, collected, stored and accessed.
Think about how to support accessing the data, but don’t worry too much about
any actual graphing UI.

Sample System Diagram

+---------------+
| login-service |--> Track.increment("new-signup")
|               |--> Track.increment("new-login")
+---------------+
+-----------------+
| payment-service |  
|                 |--> Track.increment("fraudulent-card")
+-----------------+
+--------------------+
| some-other-service |--> Track.increment("some-metric")
|                    |--> Track.increment("some-other-metric")
+--------------------+

设计一个仓库系统存储注册与登录的信息

link: https://www.1point3acres.com/bbs/thread-1048108-1-1.html

问的是仓储记录这些登录登出

similar to youtube view