Python

Solving seVeb’s Crackme05

A skill set that I haven’t quite had the chance to foster just yet is that of reverse engineering software. It’s not so much that I lack any fundamental understanding of high-level software development, low-level software concepts, or operating system concepts, but more that I just haven’t taken the time to practice and develop the skill.

I’ve decided it’s time to change that!

One of the ways I’ll be doing this is by working on crackmes. This one in particular is from Crackmes.de user seVeb and is called crackme05. It’s marked as a C/C++ program compiled for Linux and is rated as being very easy for newbies. Sounds like a perfect place to start!

Continue reading…

Dumping Map Tiles from an MBTiles Database with Python

MBTiles is a database format, developed by Mapbox, for storing tiled data. It’s a relatively simple database format that allows for a convenient, portable way to store map tile data.

Here recently, I’ve been developing code that works with tiled map data, including data contained within an MBTiles database. As part of this, I’ve needed an easy way to dump the map tiles from an MBTiles database to my local disk. It turns out that we can do this quite easily with a little bit of Python, so let’s dig in!

The MBTiles File Format

The format of an MBTiles database is really pretty simple. In fact, it’s nothing but an SQLite3 database that is formatted in a particular way. This database uses the UTF-8 encoding and contains a few tables for storing the data. For our purposes, however, we only care about the Tiles table.

The Tiles Database Table

The tiles database table must be present in an MBTiles database and must contain the following columns:

  • zoom_level : integer
  • tile_column : integer
  • tile_row : integer
  • tile_data : blob

As you can see, the format of the tile data is actually quite simple!

One Gotcha

There is one gotcha here that we need to watch out for.

We are commonly familiar with a tile being identified by its (Z, X, Y) coordinates, where Z is the zoom level, X is the column and Y is the row. So, a tile would be accessed via a URL: Z/X/Y.png. This is not how the data is formatted in the MBTiles database, however.

MBTiles encodes the zoom_level, tile_column, and tile_row according to the Tile Map Service Specification. This way of encoding the data is the same in every way, except with regards to the Y-coordinate. In the TMS way of doing things, the Y-coordinate is reversed from the “XYZ” coordinate system mentioned above. This is done via the following formula:

$$ y = 2z – y – 1 $$

This just means that we need to remember to convert the Y-coordinate in our program.

Writing Some Code

Alright, with some of the theory out of the way, let’s actually jump in and start writing some code. We will start by writing the boring boilerplate code that we don’t care about so much.

Boring Boiler Plate

import argparse
import logging
import os
import sqlite3

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

# Handle command line args
parser = argparse.ArgumentParser(description="A simple utility to extract files from MBTiles")
parser.add_argument("--input", dest="mbtile_path", help="Path to the mbtile file")
parser.add_argument("--output", dest="output_path", help="Directory to dump tiles to")
args = parser.parse_args()

if not args.mbtile_path or not args.output_path:
    logger.error("You must supply an input and output!")
elif not os.path.isfile(args.mbtile_path):
    logger.error("The input file " + args.mbtile_path + " does not exist.")
elif not os.path.exists(args.output_path):
    logger.error("The output path " + args.output_path + " does not exist.")
else:
    logger.info("Dumping tiles for " + args.mbtile_path + " to " + args.output_path)
    dump_tiles(args.mbtile_path, args.output_path)

We’ll walk through this real quick. I’m not going to spend too much time here since this is just the simple boiler-plate code to get the app started.

Lines 1-4 are our imports. We need argparse to parse the command line arguments, logging to output to the console, sqlite3 for reading the MBTiles database (remember, it’s nothing but an SQLite database), and os for working with the files on disk.

On lines 6-7 we set up our logger and then the rest of the code is simply setting up the parsing for the command line arguments. As you can see on line 24, the actual task of dumping the data will be done by the dump_tiles(PATH_TO_MBTILES, PATH_FOR_OUTPUT) function. Let’s go ahead and take a look at that function.

Implementing the Functionality

def dump_tiles(mbtilePath, output_path):
    conn = sqlite3.connect(mbtilePath)
    for row in conn.execute('SELECT * FROM tiles'):
        zoom_level = row[0]
        tile_col = row[1]
        tile_row = row[2]
        tile_data = row[3]
        write_tile(output_path, zoom_level, tile_col, tile_row, tile_data)
    conn.close()

As you can see, our dump_tiles method is pretty straight-forward. On line 1, we open the MBTiles database in an SQLite connection. We then enter a for loop that loops over every entry in the tiles table and store the tile information in some variables (lines 4-8). On line 10, we call another method, write_tile, with all of this tile data passed as parameters. We then close the connection to the SQLite database.

So, with these few lines of code, we’ve managed to fetch all of the tile data from the MBTiles database. As you could probably guess, the tiles are actually written to disk in the write_tile method.

Writing the Tiles to Disk

def write_tile(output_dir, zoom_level, column, row, data):
    row = correct_y_value(row, zoom_level)
    path = os.path.join(output_dir, str(zoom_level), str(column))
    if not os.path.exists(path):
        os.makedirs(path)
    f = open(os.path.join(path, str(row) + ".jpg"), 'w+b')
    binary_fmt = bytearray(data)
    f.write(binary_fmt)
    f.close()

At this point, we are just about done. We’ve gotten the tile data from the database, now it’s time to write it out to disk. Let’s walk through this code to see what’s happening.

On line 2 you will notice that we are reassigning the row value to be the output from the correct_y_value(row, zoom_level) method. This is to correct for the differences in the y-values that we mentioned above. We will take a look at that method in a moment.

In lines 3-5, we are creating the folder structure for the tile. The structure is output_directory/zoom_level/column. Next, on line 7, we create a filehandle to the file row.jpg. We set this handle to write binary. We then convert the raw tile data into a byte array and write that binary information to the filehandle we just created. We finish up by closing the filehandle.

Fixing the Y-Value

This leaves us with just one last thing to look at, the correct_y_value(row, zoom) method.

def correct_y_value(y, zoom):
    y_max = 1 << zoom
    return y_max - y - 1

All this method is doing is flipping the Y-value, as described in the section above.

That’s It!

That’s all there is to it! With just around 50 lines of python, we’ve managed to create a command-line application that allows us to dump tile data contained in an MBTiles database to disk. It’s worth mentioning that there are a few things we could do to make this utility more robust. For one, tile data could be either JPG or PNG. In our example, we are just assuming it’s stored as JPG data. It would be useful if our utility auto-detected which format to use, but I’ll save that as a future exercise.

Python – How to Capture Video Feed from Webcam Using OpenCV

Have you ever written code to interface with a webcam? Well, if you have then you know that it can be a royal pain in the ass. And God forbid you want it to be a cross-platform solution! The good news is that there is a ready-made solution that can help us out: OpenCV. Yes, you heard me right. Not only is OpenCV and amazing computer vision library, but it also provides a handy, cross-platform way of interfacing with webcams. Let’s take a look at how simple OpenCV makes this. I’ll be using Python for these examples, but the API is similar in other languages.

Shut Up and Show Me the Code!

Okay, okay, we’ll take a look at the code already 🙂

import cv2

# Open a handle to the default webcam
camera = cv2.VideoCapture(0)

# Start the capture loop
while True:
	# Get a frame
	ret_val, frame = camera.read()

	# Show the frame
	cv2.imshow('Webcam Video Feed', frame)

	# Stop the capture by hitting the 'esc' key
	if cv2.waitKey(1) == 27:
		break

# Dispose of all open windows
cv2.destroyAllWindows()

Not too surprisingly, running this simple script will open up a window displaying a live video feed from the default webcam. The window can be closed by hitting the escape key. I think this code is pretty self-explanatory, so I won’t dive into it here, but feel free to hit me up if you have any questions!

Quickly Generating Primes Below n With the Sieve of Eratosthenes

The uses for prime numbers in computer science are nearly endless. They are useful for everything from hashing, cryptology, factorization, and all sorts of applications in-between.

There exists a great number of algorithms that allow us to quickly generate primes, but today we are going to take a look at a popular method known as a prime sieve. There are a number of different implementations of prime sieves, but one of the simplest to implement is known as the Sieve of Eratosthenes. This algorithm is great for quickly generating smaller prime numbers (but it may not be the best choice for generating very large primes).

How it Works

In general, the Sieve of Eratosthenes works by generating a list of numbers from 2 to n. The algorithm will then work through the list, marking all the composite numbers. Here is a more detailed breakdown of the implementation:

  1. Create a list of integers from 2 to n. We start at 2 because it’s the smallest prime
  2. Set p=2
  3. Iterate over the multiples of p by counting to n from 2p in increments of p. These are the numbers that get marked as composites in the list.
  4. Find the first number greater than p in the list that is not marked. If one does not exist, we are done. If one does exist, however, set p to this new value and repeat from step 3.

This method has a complexity of $$ O\left(N \cdot log\left(log\left(N\right)\right)\right) $$.

Implementation in Python

Let’s take a look at how we can implement a Sieve of Eratosthenes in Python:

def get_primes():
  D = {}
  p = 2
  
  while 1:
    if p not in D:
      yield p
      D[p*p] = [q]
    else:
      for q in D[p]:
        D.setdefault[q+p, []).append(q)
      del D[p]
    p += 1

For this implementation, I have modified things a bit to yield an infinite prime generator.

Let’s take a moment to consider an example of how this could be used in a practical scenario. Let’s take, for example, problem 10 from Project Euler, which asks that we find the sum of all the primes below 2 million. Using our gen_primes() method we can easily solve this with the following:

primes = gen_primes()
print(sum(itertools.takewhile(lambda x: x < 2000000, primes)))

Another Practical Example

Before I wrap up this post, let’s consider just one more practical use for our gen_primes() method. Assume that we needed to find out what the nth prime is. For the purpose of example, let’s just say we want to find the 500th prime number. It turns out this can be done easily with the following:

primes = gen_primes()
print(next(itertools.islice(primes, 500, None), None))

Running this will reveal that the 500th prime number is 3,581.

Wrap Up

As I hope you can see, the Sieve of Eratosthenes is a simple way to generate prime numbers that can prove useful in a number of situations. I hope you’ve found this helpful!

How to Route Urllib2 Through Tor (Python)

I’ve recently been experimenting on a new project to scrape data from webpages located on the Tor network. For simplicity’s sake, I decided to write this bit of code in Python and use the handy urllib2 library to handle the HTTP requests.

For those that don’t know, Tor runs a SOCKS5 proxy, which, by default, runs on 127.0.0.1:9050. I thought things would be as simple as telling urllib2 to use a proxy located at IP 127.0.0.1 and port 9050, but I quickly found that this doesn’t work.

Luckily, after a bit of digging, I found a solution. It turns out that urllib2 uses Python’s socket module, which contains the method create_connection(). If we take a look at the code for this method we can see where our problem lies:

def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT,
                      source_address=None):
    """Connect to *address* and return the socket object.
 
   Convenience function.  Connect to *address* (a 2-tuple ``(host,
   port)``) and return the socket object.  Passing the optional
   *timeout* parameter will set the timeout on the socket instance
   before attempting to connect.  If no *timeout* is supplied, the
   global default timeout setting returned by :func:`getdefaulttimeout`
   is used.  If *source_address* is set it must be a tuple of (host, port)
   for the socket to bind as a source address before making the connection.
   An host of '' or port 0 tells the OS to use the default.
   """
 
    msg = "getaddrinfo returns an empty list"
    host, port = address
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            sock = socket(af, socktype, proto)
            if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
                sock.settimeout(timeout)
            if source_address:
                sock.bind(source_address)
            sock.connect(sa)
            return sock
 
        except error, msg:
            if sock is not None:
                sock.close()
 
    raise error, msg

In looking at this we can see that, even though we specified that Tor should be used as our proxy, the create_connection() function will still perform the DNS request using the default settings, hence bypassing the Tor network. Luckily, we can create our own create_connection() method and jerry-rig it into the socket class before we load urllib2. In doing this we can force the DNS request to go through Tor, thus allowing us to route our urllib2 traffic through the Tor network. This can be achieved with the following bit of code:

import socket
import socks
 
# urllib2 uses the socket module's create_connection() function.
# The way the DNS request is done won't work for our Tor connection,
# so we need to jerry-rig our own create_connection() for urllib2
def create_connection(addr, timeout=None, src=None):
  sock = socks.socksocket()
  sock.connect(addr)
  return sock
 
# Set our proxy to TOR
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9050)
 
socket.socket = socks.socksocket
socket.create_connection = create_connection # force the socket class to use our new create_connection()
 
import urllib2 # now we can import the urllib :D

With this done any request that we make using the urllib2 class will be performed over the Tor network!