My Blog: 2011

Thursday, July 7, 2011

Sudoku Solver - The GUI (Part 3 of 4)

[This post is a part of the main post - Sudoku Solver with GUI written in python.]

The Sudoku Solver has a simple and easy to use GUI.

Lets us go through the picture gallery to understand it well -

#1. Startup

This is how the Sudoku Solver looks when you start it -

Picture #1

Note the maze-like colour combination for easy identification of the (specific) 3 x 3 boxes. Also, in the side pane, you have a big white coloured (initially empty) Solution Box and 4 buttons - Save, Clear, Revert, Solve.

#2. Entering numbers -

To enter numbers in the cells, you need to just scroll over the cells. Scrolling over a cell will rotate through its permissible numbers. (Permissible numbers are those numbers which you can enter in the cell without immediately violating the Sudoku rules. The rules may later get violated.)

For example, consider this puzzle -

Picture #2

Observe that for cell 5,5 {row 5, column 5}, the permissible numbers are 1,4,5 and 6. So you can scroll on cell (5,5) to rotate through these numbers (only). You cannot enter any other number in that cell. Thus, the mechanism is such that it automatically prevents you from committing any mistakes while filling the grid.

Picture #3 (Observe the cell (5,5) for each case.)

#3. Save, Clear, Revert -

You can save a puzzle you have entered in the grid using the Save button.

Use the Clear button to clear the grid (and make it again look like Picture #1 above).

To load an already saved puzzle (the one saved using the Save button), use the Revert button.

#4. Solve and the Solution -

Once you have entered a puzzle, click Solve to solve the puzzle. The puzzle will be solved almost immediately. You can go through the solution (steps taken to solve the puzzle) in the Solution Box.

Picture #4

Picture #5

(Completely understanding the solution from the Solution Box is not easy and may take time. You will understand it better if you have gone through this post - How To Solve A Sudoku?)

Don't worry if you forget any of the above. You can get these instructions in the Sudoku Solver application too - In the menu bar, simply go to Help -> How To Use?

(In the above pictures you cannot see the menu bar, because these screen shots are from Ubuntu 11.04 in which the application menu bar is integrated with the global menu bar at the top.)

#5. Additional -

You can make the application span the entire screen using F11 (or from the menu bar, View -> Fullscreen.)

Picture #6

Also, when the mouse moves into the region of a (non empty) cell, the number within pops up. This can be noticed in the above picture too.

How to solve a Sudoku puzzle? (Part 2 of 4)

[This post is a part of the main post - Sudoku Solver with GUI written in python.]

This post explains the 'logic' required to solve Sudoku puzzles.

(These are also exactly the same steps I have followed in my program, Sudoku Solver, to solve Sudoku puzzles.)

Sudoku rules -

The objective of a Sudoku puzzle is to enter a number from 1-9 in each cell, in such a way that:

Each (horizontal) row contains each number exactly once.
Each (vertical) column contains each number exactly once.
Each of the 9* 3x3 boxes (or subgrids) contain each number exactly once.

(* - not any 3x3 box. More info here.)

Steps involved in solving Sudoku puzzles -

(Note: The steps described below are the steps I have followed in my program to solve Sudoku puzzles. There can be more ways to do the same.)

(If you are trying to understand the logic.py module (here) -
In the examples below, I have provided the name of the function (ie either Step1 or Stepk) in square brackets which contains the logic to solve that example,.)

Lets go through some easy steps first.

In the very first approach, we will try to solve the puzzle 'exactly' as far as possible. This is the simplest approach and the puzzle won't be significantly solved in most cases.

Rule #1:
If there is only 1 permissible number in a cell, enter that number in the cell.
(Permissible numbers are those numbers which you can enter in the cell without immediately violating the Sudoku rules. The rules may later get violated.)

Example #1 [Step1 (algorithm 1)]

In a row, since the only numbers can be 1 - 9 and none can repeat, it is clear that, in the picture below, the only permissible number in the cell (2,5) {row 2, column 5} is '5'.

(Puzzle #1)

Example #2 [Step1 (algorithm 1)]

Similarly, in a column, since the only numbers can be 1 - 9 and none can repeat, it is clear that, in the picture below, the only permissible number in the cell (5,7) {row 5, column 7} is '5'.

(Puzzle #2)

Example #3 [Step1 (algorithm 1)]

Similarly, in a box, since the only numbers can be 1 - 9 and none can repeat, it is clear that, in the picture below, the only permissible number in cell (2,2) {row 2, column 2} is '5'.

(Puzzle #3)

(There can be more examples based on the same rule.)

Lets proceed 1 step further.

Even in the next approach, we will try to solve the puzzle 'exactly' as far as possible. However, this is not easily visible on the very first glance. Nevertheless, it is still easy!

Rule #2:
If a number is permissible in only one of the cells in a row, then enter that number in that cell. That cell may then have more permissible numbers. Similarly, for column and box. (Below is an example for a box.)

Example #4 [Step1 (algorithm 2)]

Have a look at the puzzle below. Try to analyse the cell (2,8) {row 2, column 8}.

(Puzzle #4)

You will find that all numbers 1 through 9 are permissible for that cell. However, if you look carefully, you will find that it is the only cell in its box which can hold the number 1. Hence, we can be absolutely sure that the number in this cell should be 1.

(There can be more examples based on the same rule.)

You can practise some 'easy' level examples here to memorize the above 2 approaches.

Now, for something difficult.
Until now we solved everything 'exactly'. However, in most situations these approaches alone cannot take us much far. What we have to resort to then is assumptions! This is an approach with the help of which you can solve any Sudoku puzzle. It is like a universal tool (रामबाण).

Rule #3:
When unable to solve any further using the above approaches, make an assumption, and then try to solve it further using the above approaches. If at some point Sudoku rules get violated, then it implies that your assumption was incorrect. If so, revert back to the point where you took the assumption, take the other assumption (on the same cell) and again proceed further as before. Otherwise, if you don't encounter any rule violations till the end, your assumption was correct (and you got lucky!).
This will be more clear from the example below.

(Note: This approach is not actually difficult in the true sense, it is just laborious. Hence it is better to use this after Rule #1 and Rule #2. However, you can always this before Rules 1 and 2 if you want to!)

Example #5 [Stepk (AssumptionLevel 1)]
Observe the puzzle below. Let us try to solve this using Rule #3. Observe the cell (7,8) {row 7, column 8}.

(Puzzle #5)

You may notice that, for now, the only permissible numbers in that cell are 5 and 9. We cannot directly say at this point which one is correct (but only among them is correct!). This is where we decide to take an assumption. Let us assume the number 9 in that cell.

Now we will try to solve it further using our first 2 'exact' approaches. It can be immediately seen that, using Rule #2, the number in cell (3,7) {row 3, column 7} should be 9.

Similarly, again using Rule #2, the number in cell (2,3) {row 2, column 3} should be (again) 9.

Now, if you look carefully, you will find that there is some problem. You cannot fit 9 anywhere in the box (2,1) without violating the Sudoku rules. But this itself is a violation of the 3rd rule - "Each of the 9 3 x 3 boxes (or subgrids) contain each number exactly once." But where did we go wrong?

- at the point of assumption! We wrongly assumed the number in the cell (7,8) {row 7, column 8} (in Puzzle #5) to be 9. Although we were not sure about it then, we can now be sure that the number in the cell (7,8) {row 7, column 8} in Puzzle #5 should be 5!
(Note: This is the reason why we made an assumption on a cell which then had only 2 permissible numbers. Had we made an assumption on some other cell which had more than 2 permissible numbers, we would have not been able to deduce the correct number even at this stage!)

Now that we have the correct number in cell (7,8) {row 7, column 8}, we try to solve it further using Rules 1 and 2. You will notice that the puzzle now easily gets completely solved.

This might take some time to understand. You can practise some 'medium' level examples here to better understand this approach.

There still remains 1 unanswered question -

How to decide on which cell an assumption must be made?
You can make an assumption on any cell as long as it has only exactly 2 permissible numbers. It is quite possible that you may not be able to make a decision (about the correct number in a cell) on the very first assumption itself (and this is what usually happens), where both the assumptions (on that particular cell) might seem to be true. In this case, you need to leave that cell and try for an assumption on some other cell.

The True Picture -
99.9% of the Sudoku puzzles can be solved using this approach. For the rest 0.1% you need to resort to assumptions over assumptions. This is again not difficult, but it becomes too laborious and should be kept as a measure of last resort after you have tried to solve the puzzle by making assumptions on all cells (with exactly 2 permissible numbers) and still you are not able to solve the puzzle completely.
Don't worry, you normally won't come across any puzzle from this 0.1% category!
(Not even the hard/extreme/evil Sudoku puzzles on most websites fall in this category!)

I denote the nesting of assumptions by their level. In the above puzzle #5, we made only 1 assumption and solved the rest of the puzzle exactly (ie using no more assumptions.) This assumption is denoted as an assumption of Level 1. If you are taking an assumption on an assumption (and no more assumptions), the later assumption is an assumption of Level 2. Similarly there can be more levels.

Remember the link to one of the toughest puzzles I gave at the beginning of the main post? (if not, it is here).
To solve it completely, it requires an assumption of Level 3!

Sudoku Solver with GUI written in python (Part 1 of 4)

Sudoku Solver is a small graphical app for solving any given Sudoku puzzle, almost instantaneously. It also shows the steps required to solve the same.

(Have a look at one of the toughest Sudoku puzzles here. Sudoku Solver can solve it too, and it is the toughest one I have found till date!)

This is how Sudoku Solver looks like -

For what audience is this post intended?

Although I have primarily written this post for sharing my python code, this can be useful for you if -

You want to learn how to solve a Sudoku puzzle by hand.
You want to yourself make a Sudoku solving program/application.
You want to see some large, fairly complicated, working python code.
You don't care how the Sudoku Solver works, you just want to use it to solve some of your Sudoku puzzles.

OK, I am interested. Give me some background of Sudoku.

Sudoku is a logic based placement puzzle. A Sudoku puzzle consists of a 9 x 9 grid cells which are divided into 9 rows, columns and subgrids (as shown above). The task is to place the numbers from 1 to 9 in the empty cells in such a way that, in every row, column and 3 × 3 subgrid, each number appears only once.

How to solve a Sudoku puzzle?

Although the rules of Sudoku are quite simple, solving it can sometimes become a tough task.
I have posted this separately with lots of pictorial examples here - How To Solve A Sudoku puzzle?

So what can this program do? Show me some examples.

It can solve any Sudoku puzzle, almost instantaneously.

Example -

Take any Sudoku puzzle you like, from anywhere - newspapers, websites, etc. (For this example, I have taken the one from here.) Fill up the Sudoku Solver accordingly (How do I use the Sudoku Solver?).

Click on Solve. The puzzle will be solved instantly. Note that it shows the steps required to solve the puzzle in the side pane.

(Note the time taken to solve the puzzle - 0.05 seconds!)

What's different in this?

Mostly on the web, you will find either of these -

Lots of websites having tonnes of Sudoku puzzles to solve and their final answers (but they don't show the steps required to solve these puzzles).
Some Sudoku solving apps/applets/programs, but they can solve only the simpler ones.

With my app, you can -

Solve any Sudoku puzzle.
Also see the steps required to solve the same.

OK, so where is the source code?

The project is hosted on GitHub (version 1.2 onwards).
If you simply wish to download the source code, see the next section.

How do I run/install Sudoku Solver?

Make sure you have Python 2.7 installed.
(On Linux: sudo apt-get install python; Windows/Mac: download)
Make sure you have PyQt4 installed.
(On Linux: sudo apt-get install python-qt4; Windows/Mac: download)
Click here to download the source files. Extract it.
Simply execute the module sudoku_solver.py in a terminal-
```
$ python sudoku_solver.py
```

To learn to use the GUI, refer to the post - Sudoku Solver - The GUI.

Coming to the programming part, how is the program structured?

The program consists of 2 parts -

The Graphical User Interface (GUI).
The logic required to solve (any) Sudoku puzzle.

Tell me more about the GUI.

I have fully described the Sudoku Solver GUI with pictorial examples in this post - Sudoku Solver - The GUI.

____________________________________________________

I started programming this in mid-May, 2011. It took around 10 days for me to complete it. A month later I decided to document it (which took much more time). Finally, I decided to host this as a project on on PyPI. It was fun! Now it is hosted on GitHub (version 1.2 onwards).

Monday, June 27, 2011

Nested tar archives extractor (.tar/ .tgz/ .tbz/ .tb2/ .tar.gz/ .tar.bz2) written in python

I recently wrote a small script in python which can be used to recursively extract nested tar archives about which I have discussed in this post (in quick question-answer format).

For what audience is this post intended?

I have primarily written this post for sharing the program so it is mainly for those who have some background in python programming. However, if you just want to use this as a utility, then you can simply skip to the download part.

(This post has been written assuming that the underlying Operating System is Linux. However, it should be applicable for Windows too.)

So, lets begin...

What is a nested tar archive?

It is a tar archive containing other tar archives which may further contain many more tar archives (and so on...)

So what does this program do?

It extracts tar archives recursively. (What/How? It can be made clear from the examples below.)

What's different in this?

Ordinary extractors normally just extract a tar archive once, ie they won't extract any other tar archives (if any) that are present in it. If it has more tar archives and you want to extract them too, then you have to yourself extract each of these archives. This can be a real headache if there are many tar archives (and are nested many levels deep). I have tried to make this thing easy using this tool.

Can you show me some examples?
Yes, why not.

Example #1
If this is the directory structure of a tar archive -

parent/
    xyz.tar/
        a
        b
        c
        d.tar/
            x
            y
            z

After extracting using a regular extractor -

parent/
    xyz.tar
    xyz/
        a
        b
        c
        d.tar/       # this was not extracted.
            x
            y
            z

After extracting using my extractor -

parent/
    xyz.tar
    xyz/
        a
        b
        c
        d/
            x
            y
            z

(Structure of the tar archive xyz.tar)

(On extracting using a regular extractor. Note that 'd.tar' is still not extracted.)

(On extracting using my extractor. Note 'd.tar' has been extracted to the folder 'd')

Example #2

While extracting, my extractor also takes care about not replacing/overwriting already existing folders.

So if this is the directory structure of a .tar archive -

parent/
    xyz.tar/
        a
        b
        c
        d.tar/
            x
            y
            z
        a.tar/       # note if I extract this directly, it will
                     # replace/overwrite the contents of folder 'a'.
            m
            n
            o
            p

After extracting using my extractor -

parent/
    xyz.tar
    xyz/
        a
        b
        c
        d/
            x
            y
            z
        a 1/          # extracted 'a.tar' to the folder 'a 1' as
                      # folder 'a' already exists in the same folder.
            m
            n
            o
            p

What do I need on my PC to run this program?

Either python 2.6 or python 2.7. I have written this program in python 2.7 and it requires no extra packages to be installed. It should also work on prior python 2.x versions but I have not tested it on any of these.

Is the program easy to understand?
Yes, it is a very simple. Also, I have provided plenty of documentation (comments) at every step in my code below, so understanding it won't be difficult at all. In fact, you can easily customize it later to suit your needs.

OK, so where is the code?
Here it is -

#! /usr/bin/env python
# -*- coding: UTF-8 -*-

"""A command line utility for recusively extracting nested tar archives."""

__author__ = "Pushpak Dagade (पुष्पक दगड़े)"
__date__   = "$4 July, 2011 3:00:00 PM$"

import os
import sys
import re
import tarfile
from argparse import ArgumentParser

major_version = 1
minor_version = 1
error_count = 0

file_extensions = ('tar', 'tgz', 'tbz', 'tb2', 'tar.gz', 'tar.bz2')
# Edit this according to the archive types you want to extract. Keep in
# mind that these should be extractable by the tarfile module.

__all__ = ['ExtractNested', 'WalkTreeAndExtract']

def FileExtension(file_name):
    """Return the file extension of file

    'file' should be a string. It can be either the full path of
    the file or just its name (or any string as long it contains
    the file extension.)

    Example #1:
    input (file) -->  'abc.tar.gz'
    return value -->  'tar.gz'
    
    Example #2:
    input (file) -->  'abc.tar'
    return value -->  'tar'
    
    """
    match = re.compile(r"^.*?[.](?P<ext>tar[.]gz|tar[.]bz2|\w+)$",
      re.VERBOSE|re.IGNORECASE).match(file_name)

    if match:           # if match != None:
        ext = match.group('ext')
        return ext
    else:
        return ''       # there is no file extension to file_name

def AppropriateFolderName(folder_fullpath):
    """Return a folder (path) such that it can be safely created in
    without replacing any existing folder in it.

    Check if the folder folder_fullpath exists. If no, return folder_fullpath
    (without changing, because it can be safely created
    without replacing any already existing folder). If yes, append an
    appropriate number to the folder_fullpath such that this new folder_fullpath
    can be safely created.

    Examples:
    folder_name  = '/a/b/untitled folder'
    return value = '/a/b/untitled folder'   (no such folder already exists.)

    folder_name  = '/a/b/untitled folder'
    return value = '/a/b/untitled folder 1' (the folder '/a/b/untitled folder'
                                            already exists but no folder named
                                            '/a/b/untitled folder 1' exists.)

    folder_name  = '/a/b/untitled folder'
    return value = '/a/b/untitled folder 2' (the folders '/a/b/untitled folder'
                                            and '/a/b/untitled folder 1' both
                                            already exist but no folder
                                            '/a/b/untitled folder 2' exists.)
                                        
    """
    if os.path.exists(folder_fullpath):
        folder_name = os.path.basename(folder_fullpath)
        parent_fullpath = os.path.dirname(folder_fullpath)
        match = re.compile(r'^(?P<name>.*)[ ](?P<num>\d+)$').match(folder_name)
        if match:                           # if match != None:
            name = match.group('name')
            number = match.group('num')
            new_folder_name = '%s %d' %(name, int(number)+1)
            new_folder_fullpath = os.path.join(parent_fullpath, new_folder_name)
            return AppropriateFolderName(new_folder_fullpath)
            # Recursively call itself so that it can be check whether a
            # folder with path new_folder_fullpath already exists or not.
        else:
            new_folder_name = '%s 1' %folder_name
            new_folder_fullpath = os.path.join(parent_fullpath, new_folder_name)
            return AppropriateFolderName(new_folder_fullpath)
            # Recursively call itself so that it can be check whether a
            # folder with path new_folder_fullpath already exists or not.
    else:
        return folder_fullpath

def Extract(tarfile_fullpath, delete_tar_file=True):
    """Extract the tarfile_fullpath to an appropriate* folder of the same
    name as the tar file (without an extension) and return the path
    of this folder.

    If delete_tar_file is True, it will delete the tar file after
    its extraction; if False, it won`t. Default value is True as you
    would normally want to delete the (nested) tar files after
    extraction. Pass a False, if you don`t want to delete the
    tar file (after its extraction) you are passing.

    """
    try:
        print "Extracting '%s'" %tarfile_fullpath,
        tar = tarfile.open(tarfile_fullpath)
        extract_folder_fullpath = AppropriateFolderName(tarfile_fullpath[:\
          -1*len(FileExtension(tarfile_fullpath))-1])
        extract_folder_name = os.path.basename(extract_folder_fullpath)
        print "to '%s'..." %extract_folder_name,
        tar.extractall(extract_folder_fullpath)
        print "Done!"
        tar.close()
        if delete_tar_file: os.remove(tarfile_fullpath)
        return extract_folder_name

    except Exception:
        # Exceptions can occur while opening a damaged tar file.
        print '(Error)\n(%s)' %str(sys.exc_info()[1]).capitalize()
        global error_count
        error_count += 1

def WalkTreeAndExtract(parent_dir):
    """Recursively descend the directory tree rooted at parent_dir
    and extract each tar file on the way down (recursively)."""
    try:
        dir_contents = os.listdir(parent_dir)
    except OSError:
        # Exception can occur if trying to open some folder whose
        # permissions this program does not have.
        print 'Error occured. Could not open folder %s\n%s'\
          %( parent_dir, str(sys.exc_info()[1]).capitalize())
        global error_count
        error_count += 1
        return

    for content in dir_contents:
        content_fullpath = os.path.join(parent_dir, content)
        if os.path.isdir(content_fullpath):
            # If content is a folder, walk down it completely.
            WalkTreeAndExtract(content_fullpath)
        elif os.path.isfile(content_fullpath):
            # If content is a file, check if it is a tar file.
            if FileExtension(content_fullpath) in file_extensions:
                # If yes, extract its contents to a new folder.
                extract_folder_name = Extract(content_fullpath)
                if extract_folder_name:     # if extract_folder_name != None:
                    dir_contents.append(extract_folder_name)
                    # Append the newly extracted folder to dir_contents
                    # so that it can be later searched for more tar files
                    # to extract.
        else:
            # Unknown file type.
            print 'Skipping %s. <Neither file nor folder>' % content_fullpath

def ExtractNested(tarfile_fullpath):
    extract_folder_name = Extract(tarfile_fullpath, False)
    if extract_folder_name:         # if extract_folder_name != None
        extract_folder_fullpath = os.path.join(os.path.dirname(
          tarfile_fullpath), extract_folder_name)
        WalkTreeAndExtract(extract_folder_fullpath)
        # Given tar file is extracted to extract_folder_name. Now descend
        # down its directory structure and extract all other tar files
        # (recursively).
        
if __name__ == '__main__':
    # Use a parser for parsing command line arguments
    parser = ArgumentParser(description='Nested tar archive extractor %d.%d'\
      %(major_version,minor_version))
    parser.add_argument('tar_paths', metavar='path', type=str, nargs='+',
      help='Path of the tar file to be extracted.')
    extraction_paths = parser.parse_args().tar_paths
    
    # Consider each argument passed as a file path and extract it.
    for argument in extraction_paths:
        if os.path.exists(argument):
            #print       # a blank line
            ExtractNested(argument)
        else:
            print 'Not a valid path: %s' %argument
            error_count += 1
    if error_count !=0: print '%d error(s) occured.' %error_count

(download source files from PyPI)

What coding conventions does the program follow?

I have followed PEP 8 coding style and PEP 257 docstring conventions while writing the program (with 1 small exception - I have used maximum line width of 80 characters while PEP 8 suggests a maximum of 79 characters.)

OK, now I have downloaded the source files.
How do I use it to extract 'nested tar archives'? OR
How do I access it from the terminal?
Simply follow these steps -

Ensure you have either python 2.6 or python 2.7 installed.
Extract the downloaded file "nested.tar.archives.extractor-1.1 [python 2.x].tar.gz"
Copy the file "extractnested.py" to one of the folders in your PATH environment variable
(Note: For linux users, as you might know, to allow execution of extractnested.py as a bash script, you need to grant it permissions - $ chmod ugo+rx extractnested.py)
Now you can extract any tar archive from the terminal -
```
$ extractnested.py path [path ...]
```
where path is the path of the tar archive you want to extract.

(Extracting the tar archive 'xyz.tar')

You can also extract more than 1 archives in one command.

(Extracting 'xyz.tar' and 'abc.tar' in the same command.)

Any future plans for development?
Well ....yes. I plan to extend this utility for extracting other archives like zip, 7z, rar, etc so it can be an 'all in one extraction utility'. Also I intend to add this to context menus so that one can simply right click on any compressed archive and extract it with a single click!

____________________________________________________

I actually wrote this program in response to this question asked on StackOverflow which then had an active bounty of 50 rep. points (StackOverflow users will understand what I am saying!) in the greed of winning the bounty. So I worked on the program for 2 days and then posted it on StackOverflow. Unfortunately, even after a few days the user did not reply and I did not get any bounty, not even a single rep. point. So, upset, I decided to upload this program (with slight changes) to PyPI and also here on the blog.

Now, coincidentally or not, I don't know, that night after I uploaded this to PyPI and also posted it here, the user replied and rewarded me with the bounty! I got 75 rep. points! Believe me, there was no way the user knew I will be posting this here on the blog! When I saw this the next morning (that I won the bounty), my joy knew no bounds!
That day, I did not do any work in the office (@internship) as I was busy editing this post!