Monday, August 14, 2017

Code

I found this puzzle in the internet. It looks quite easy to solve, but if you want to dig deeper into this you can find quite interesting fact.




Solution (without spoilers):
Let's name those rules with letters as follows:

206 twoNumbersAreCorrectButWrongPlaced
B 682 oneNumberIsCorrectAndWellPlaced 
C 614 oneNumberIsCorrectButWrongPlaced 
780 oneNumberIsCorrectButWrongPlaced 
E 738 nothingIsCorrect

How many codes is still in game after applying rules?

Count valid codes in case we are applying only two rules:


X A B C D E
A 69 14 25 25 42
B 14 192 40 40 72
C 25 40 386 144 152
D 25 40 144 386 78
E 42 72 152 78 343
Now let's focus on AB and another rule:
ABA 14
ABB 14
ABC 1
ABD 7
ABE 10

So if the winner exists we can focus on rules ABC to find it.

If you want to figure out how to solve it in java you can find answer on github [WARNING: spoiler in tests]:

Tuesday, April 26, 2016

Global unroll

Spock is a great tool, but my coleague Marcin ZajÄ…czkowski invented a way to make it perfect. He published his lib  to unroll all tests by default. This is very nice feature, because this is probably the desired way of working with parametrized tests.
Just add this lib to testCompile:
info.solidsoft.spock:spock-global-unroll:0.5.0
now you can remove all @Unroll from your tests.

Simple pull request made by author:
https://github.com/piotrpietrzak/aprcalc/pull/2/files

Sunday, April 24, 2016

Shadows of BigDecimal

We are using BigDecimal in many places. We are using BigDecimal for modeling currencies, financial measures (interest rates, coefficients), maybe some other domain values that needs precision better than Double provides.

BigDecimal is common and straightforward answer for precision problem, but why it sucks?

First problem with BigDecimal is lack of power function - there is no BigDecimal^BigDecimal. Of course this is not a real problem - you can use some external library for doing this or write this one on your own. To do this we have to implement e^x ln(x) and n! - piece of cake. Frustration is bigger if you try to find a lib for that.

Second problem is that even you solve the first problem there is new - performance problem. This problem accompanies us as a humanity for ages. If we want to perform operations with precision better than float or double offers we have to struggle with numerical recipes and lack of low level support. This one is quite interesting. 

What is my understanding of low level support? Lets go back in time to 286 (Intel 286 not A.D. 286). There were no low level support for floating point operations at all. For 386SX there were new option to buy 387 floating point co-processor to perform floating point operations faster. There were also software emulators for 387 (to run apps written for 386DX with embedded 387 on plain 386SX). From that time later on Intel integrate those two chips in one. Modern IA-64 processors delivers floating point programming model based on separate instruction set to perform operations like add and multiply. As we know from math perspective it is enough to implement more sophisticated functions like e^x, ln x and others. We have also support to approximate reciprocal for floating point number and therefore we can divide by float because division is just a multiplication by reciprocal. Those operations are faster than those you can implement in plain assembly. But can we benefit from this mechanism in BigDecimal computations?

To answer this question we have to go deep to the assembly language. For java this is surprisingly quite easy. If we add "-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly" to java command line and make it work. We will get assembly for parts that was compiled with JIT. You should get something like this:

Decoding compiled method 0x000000010fdffd90:
Code:
[Entry Point]
[Constants]
  # {method} {0x0000000123bb5310} 'multiply' '(Ljava/math/BigDecimal;)Ljava/math/BigDecimal;' in 'java/math/BigDecimal'
  # this:     rsi:rsi   = 'java/math/BigDecimal'
  # parm0:    rdx:rdx   = 'java/math/BigDecimal'
  #           [sp+0xf0]  (sp of caller)
  0x000000010fdfffe0: mov    0x8(%rsi),%r10d
  0x000000010fdfffe4: shl    $0x3,%r10
  0x000000010fdfffe8: cmp    %rax,%r10
  0x000000010fdfffeb: jne    0x000000010fa45e20  ;   {runtime_call}
  0x000000010fdffff1: data32 data32 nopw 0x0(%rax,%rax,1)
  0x000000010fdffffc: data32 data32 xchg %ax,%ax
(...)
Code for "simple" multiplication fills few screens and takes a lot of compare and jump loops.
Long trace of this output is not interesting to follow, but there is no signs for floating point operations. It looks like our JIT don't find floating point operations useful for BigDecimal operations.

My personal experience with this effect was when I was implementing "better" way of computing APR. I have to implement a^x and therefore I was forced to implement e^x and ln x on my own. Of course it was based on multiplication/division and add operations. At the end - results was worse than I expected. For very common arguments for APR computations took more than second on my i5 CPU. During those computations one core was working on 100%. I won't merge this branch because it renders my library useless - I can't waste so much CPU time for computations of APR.

Third problem with BigDecimal is inconsistency of basic math operations like comparison value to another one. Lets focus on ZERO. This number is very important because we can't find reciprocal of ZERO. As a senior math user I was aware that division(x) will fail for x = ZERO and therefore I was checking if(x.equals(ZERO)) to prevent division. Unfortunately there is a case when x is not equal ZERO (x="OE-21") in terms of equals method, but it ends with "BigInteger divide by zero". Fortunately there is compareTo method which returns: -1, 0, or 1 as this BigDecimal is numerically * less than, equal to, or greater than val. To solve this issue you should compareTo val rather than check if it is equal to val.

Even though some problems of BigDecimal really annoys me I found it very useful for modeling math and financial domains. It has its own problems, but I will not implement this kind of functionality better on my own.

Saturday, April 9, 2016

APR Adventure

APR - part I

When you want to compare two things in Java you should implement Comparable<> interface (in math it is called total order). You don't have to compute single value for one object and then second for the other to compare them. Of course in some cases this approach is even simpler, but it brings some other problem - there is no easy way to check if you are comparing apples to apples because there is no such thing as a BigDecimal<Apple>.

In real world when we want to compare sticks we will probably use the length as a measure to ease entire process. For stones we can use length of stone as a measure of "good stick and stone". As you can imagine using one coefficient for comparing sticks and stones may break my bones.

EU enforces some regulations on every company that lend money to provide single coefficient to ease consumers compare their offers.

Main problem of this approach is that it is very hard to compare two loans, because we are comparing two cashflows (in Java Map<LocalDate, BigDecimal>) using one coefficient (Java type: BigDecimal). So we are mapping map to scalar and compare those scalars. As you can imagine this is universal way of comparing apples and oranges.

For unknown reasons EU choose not easy to solve numerically measure called Annual Percentage Rate - APR.
Ak is lets say money we pay to our Customer and A'k' is money paid by Customer to us.
i in this formula is APR itself.
How hard it can be to find value of i from this formula?

Here we have first problem - this is no simple and generic method/formula to find "i". It is that hard that a lot of companies on their websites deliver wrong results to their clients (this is prohibited by the law by the way - you have to provide valid APR according to UE regulations. Those regulations are rewritten in local law and translated to many european languages, but math stays the same). 

 Let's redefine this problem a little bit. We have a situation f(x)=g(x) which we can rewrite as a f(x)-g(x)=0. This is more general problem - we have to find root of equation expressed as a sum of all payments discounted to the first day of loan.

From the API perspective we have a cashFlow defined as a Map<LocalDate, BigDecimal> and thats it! There is an opensource library to perform all those computations and provide results for mentioned API: 

APR calculator

Source code you can find here.

Wednesday, March 2, 2016

Schedulers in couchbase

Couchbase java diver by default provides threading pools that are quite interesting. Why "are" instead of "is"? Because there are two of them - one for I/O operations and another one for computations.

Lets focus first on I/O operations. 

There is a special I/O Thread Pool provided by couchbase driver to deal with io intensive operations. You can define only pool size - there is small hint in documentation - try to set this value to number of processors in your systems. This default value is connected with a way that IO operations are performed (non blocking netty). So far so good - very easy configuration: only one int parameter to set with reasonable default.
There is no option in API to provide IO Thread Pool/Scheduler/whatever to manually tune it up.

We can do a lot more with Computational Thread Pool. First of all we can use default CTP with manually set computationPoolSize. This thread pool is produced by slightly modified default RX Scheduler (which provides custom names like cb-computational-%d).

There is more interesting thing we can do with CTP - we can provide our own RX Scheduler. We can do this like this:
https://gist.github.com/piotrpietrzak/9c6f11f8842184eae718

Now we have full control over creating each thread in our application. Why should we struggle with that?

There is multiple good reasons to control that, but this is only one way to make correlationId (aka trace and span in spring sleuth). We have to provide our trace and span context (for example via TraceContextHolder) from thread managed by spring mvc to thread provided by our scheduler.

Monday, February 15, 2016

How to store passwords

Hash function

Let's do some modeling - we will start with storing password problem in our Identity Management System.

First of all I want to start with a statement: we don't want to store plaintext user passwords. We want to know that provided password is valid. Basic idea behind this concept is to create a function to transform user password to less vulnerable and irreversible hash.

At this point I think we are all aware that we should do this that way and you probably know about existence of hash functions. Choosing a proper algorithm to provide such hash is just not so simple.
We can start with some basic model and we will define some rules to our hash algorithm later on.
PasswordHashProvider - basic interface to provide one or more hash algorithms should looks like this:

public interface PasswordHashProvider {
    String encode(String password, UUID salt);

    default boolean verify(String password, UUID salt, String hash) {
        return encode(password, salt).equals(hash);
    }
}

With this simple interface we can model our requirements and implement few tests.
First of all we want our function to provide some properties like:
  • it is quick to compute the hash value for any given message
  • it is infeasible to generate a message from its hash
  • it is infeasible to modify a message without changing the hash
  • it is infeasible to find two different messages with the same hash.
To validate first property: "it is quick to compute the hash value for any given message" we can generate multiple passwords and fail if our computations last a little bit too long. Lets write test for this:

@Timeout(value = 100, unit = TimeUnit.MILLISECONDS)
def "should generate hash during 100 milliseconds or faster"() {
 given:
  String simplePassword = "PASSWORD"
 when:
  String result = passwordHashProvider.encode(simplePassword, UUID.randomUUID())
 then:
  !result.isEmpty()
}


Second property of hash function - "infeasible to generate a message from its hash" is quite interesting from mathematical perspective. First of all this kind of function should be irreversible (I mean there should be no function f^-1 that f(f^-1(x)) = x ). This property is very easy to break- I mean every reversible function should be injective. To break injective easily we can assign same result for multiple values. This is the case for our password hash function - you can login with different password you defined but it should be infeasible to find the second one.

To express third property: "infeasible to modify a message without changing the hash" I wrote simple test:

def "should generate different hash for same passwords and different salt"() {
 given:
  List hashes = []
  Integer countSum = 0
  String password = "PASSWORD"
 when:
  (1..ITERATIONS).each {
   String hash = passwordHashProvider.encode(password, UUID.randomUUID())
   hashes.add(hash)
   countSum += hashes.count(hash)
  }
 then:
  countSum == ITERATIONS
}
Ok - I will explain one trick used here to validate this property. We want to check if change of salt will change the password hash. Salt is just some known value that is accessible during encryption of passwords to improve their quality and complicate hacker life because password rainbow tables (tables with common hash/password combinations) does not work for passwords with salt properly added during encoding.

Second trick of this test is question: how to verify that our hashes are unique? Simple idea behind this is to sum count of our hash among other hashes and after that check if our sum is equal to number of iterations. In case we provide only one hash for all values we will get sum of arithmetic progression. In our case for lets say 100 password hashes we will get sum of 5050. Therefore this sum should be from range 100 - 5050.

I can't find a way to test property "it is infeasible to find two different messages with the same hash". We have to assume that this property should be reported by security researchers if we use one of well known algorithms.


Well known algorithms:

MD5

This algorithm fails the last property: "it is infeasible to find two different messages with the same hash". You can find details searching for phrase: "MD5 is not collision resistant".

Bcrypt

Default password hash algorithm for BSD and other systems including some Linux distributions.

Scrypt

What a silly name! What should I put in google to find java implementation? Java scrypt? :)
This algorithm characterizes with some good properties like being  quite heavy memory intensive (which is good, because it saves us from farms of nvidia cores ready to break our computation intensive algorithms - as a humanity we can't scale memory without huge costs (in early 2016) ).

PBKDF2

After wikipedia: "PBKDF2 (Password-Based Key Derivation Function 2) is a key derivation function that is part of RSA LaboratoriesPublic-Key Cryptography Standards (PKCS) series, specifically PKCS #5 v2.0, also published as Internet Engineering Task Force's RFC 2898."

Argon2


To sum up

Lots of algorithms to hash passwords exists but  which one is best or just good enough? Of course Argon2 is quite interesting. We can define how hard it can be to cpu, memory and we can also define degree of parallelism. For now I can't find JVM implementation of this algorithm - only bindings to C lib.

PBKDF2 seems reasonable - it has java native support (javax.crypto package). It has quite interesting properties and is widely implemented. It has some troubles like being cpu rather than memory intensive (which in cloud environments is also crucial for us). Lets start with PBKDF2 and move to Argon2 someday (the day I will find the JVM implementation with Apache2 compatible licence).

Friday, February 12, 2016

Spring sleuth

Correlation Id

When you start with microservices the natural question comes to mind: how to debug all this stuff. Unfortunately some of problems appear only in production environments. How to trace them effectively? Solution is not always so simple, but correlation id should help you with debugging such things. I will show you logs and everything should become clear right now:

2016-02-12 19:48:41.116 [trace=6d9ed076-bb14-42e9-b537-3ae1e85441b0,span=6d9ed076-bb14-42e9-b537-3ae1e85441b0] [o-auto-1-exec-3] o.s.cloud.sleuth.util.ExceptionUtils    : Trace client warn: thread http-nio-auto-1-exec-3 tried to start a new Span with parent MilliSpan(begin=1455302921115, end=0, name=null, traceId=6d9ed076-bb14-42e9-b537-3ae1e85441b0, parents=[], spanId=6d9ed076-bb14-42e9-b537-3ae1e85441b0, remote=true, exportable=true, annotations={}, processId=null, timelineAnnotations=[]), but there is already a currentSpan MilliSpan(begin=1455302921054, end=0, name=http/api/user/januszkowalski, traceId=6d9ed076-bb14-42e9-b537-3ae1e85441b0, parents=[], spanId=6d9ed076-bb14-42e9-b537-3ae1e85441b0, remote=false, exportable=false, annotations={}, processId=null, timelineAnnotations=[])
2016-02-12 19:48:41.119 [trace=6d9ed076-bb14-42e9-b537-3ae1e85441b0,span=17a6a6c2-dba3-4ac1-a23b-2deb6371c08b] [o-auto-1-exec-3] o.s.cloud.sleuth.log.Slf4jSpanListener  : Starting span: MilliSpan(begin=1455302921119, end=0, name=http/api/user/januszkowalski, traceId=6d9ed076-bb14-42e9-b537-3ae1e85441b0, parents=[6d9ed076-bb14-42e9-b537-3ae1e85441b0], spanId=17a6a6c2-dba3-4ac1-a23b-2deb6371c08b, remote=false, exportable=true, annotations={}, processId=null, timelineAnnotations=[])


First idea is to add unique id to every request and trace this id between another microservices interactions. This trick gives us possibility to trace our logs with this uid. Quite simple, but how to make this without rewriting this feaature from scratch?

In Approdorix project I started with Spring sleuth. It was really simple, just one line in application.properties:

logging.pattern.console=%d{yyyy-MM-dd HH:mm:ss.SSS} [trace=%X{X-Trace-Id:-},span=%X{X-Span-Id:-}] [%15.15t] %-40.40logger{39}: %m%n