c++ - stdio very slow
- Heinz Saathoff (16/16) Aug 12 2004 Hello,
- Walter (4/20) Aug 12 2004 It could be that the optimal buffer size you need is not the default one
- Heinz Saathoff (9/33) Aug 13 2004 I only do a sequential read using fgetc. As far as I know stdio also
- Jan Knepper (10/41) Aug 13 2004 How much are you reading at once?
- Heinz Saathoff (54/66) Aug 16 2004 When using stdio I read one char at a time with fgetc. But internally
- Scott Michel (8/25) Aug 16 2004 Jan's point is that a function call to fgetc() has a lot more overhead
- Heinz Saathoff (15/27) Aug 17 2004 That fgetc() has much overhead is true, but I wasn't sure why it's
- Scott Michel (20/31) Aug 17 2004 Well, you could purchase the CD and look at the code. :-)
- Heinz Saathoff (10/43) Aug 18 2004 I already have. That' why I have the sources.
- Scott Michel (7/9) Aug 18 2004 Google for the linux kernel patch that modifies the kernel at run-time
- Walter (5/37) Aug 13 2004 Try setting the buffer size larger, not smaller, and make it a multiple ...
- Heinz Saathoff (7/26) Aug 16 2004 Hello Walter,
- Walter (3/9) Aug 16 2004 fgetc also must do thread synchronization.
Hello, I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory mapped approch but still very fast. Here are the times measured: stdio : 14.8 seconds _read : 1.8 seconds mmap : 0.8 seconds Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time. - Heinz
Aug 12 2004
It could be that the optimal buffer size you need is not the default one used by stdio. "Heinz Saathoff" <hsaat despammed.com> wrote in message news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...Hello, I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory mapped approch but still very fast. Here are the times measured: stdio : 14.8 seconds _read : 1.8 seconds mmap : 0.8 seconds Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time. - Heinz
Aug 12 2004
Hello Walter, Walter wrote ...It could be that the optimal buffer size you need is not the default one used by stdio.I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size."Heinz Saathoff" <hsaat despammed.com> wrote in message news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...Hello, I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory mapped approch but still very fast. Here are the times measured: stdio : 14.8 seconds _read : 1.8 seconds mmap : 0.8 seconds Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time.
Aug 13 2004
Heinz Saathoff wrote:I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size.How much are you reading at once? fgetc only does 1 character per call. (_)read usually does more. Calling the buffered I/O system for every single character or once for a block of 128 does make a difference.-- ManiaC++ Jan Knepper But as for me and my household, we shall use Mozilla... www.mozilla.org"Heinz Saathoff" <hsaat despammed.com> wrote in message news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...Hello, I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory mapped approch but still very fast. Here are the times measured: stdio : 14.8 seconds _read : 1.8 seconds mmap : 0.8 seconds Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time.
Aug 13 2004
Hello Jan, Jan Knepper wrote...Heinz Saathoff wrote:When using stdio I read one char at a time with fgetc. But internally stdio users a buffer too. My simple buffered file is this: ------------------- buffered file -------------------------- class CppFILE { public: CppFILE() : fhandle(0), idx(0), filled(-1) {} CppFILE(const char *name, const char *mode) : idx(0), filled(-1) { Open(name, mode); } ~CppFILE() { Close(); } bool Open(const char *name, const char *mode) { fhandle = _open(name, _O_RDONLY|_O_BINARY); return fhandle>=0; } void Close() { if(fhandle > 0) { _close(fhandle); fhandle = -1; idx = 0; filled = -1; } } int getc(); protected: void Fill(); int fhandle; unsigned char buffer[4096]; int idx, filled; }; void CppFILE::Fill() { if( fhandle>=0 && (filled < 0 || filled == sizeof(buffer)) ) { // fill possible filled = _read(fhandle, buffer, sizeof(buffer)); //printf("Fill: read %d\n", filled); idx = 0; }//if } int CppFILE::getc() { if(idx < filled) return buffer[idx++]; Fill(); if(idx < filled) return buffer[idx++]; return EOF; } ---------------- end buffered file ------------------------- Instead of fgetc I used infile.getc() to read a single char. I thought that fgetc would do it's buffering in a similar way. But it seems that fgetc() does much more than my simple getc(). I think it's time to look for the sources of stdio. - HeinzI only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size.How much are you reading at once? fgetc only does 1 character per call. (_)read usually does more. Calling the buffered I/O system for every single character or once for a block of 128 does make a difference.
Aug 16 2004
Heinz Saathoff wrote:Hello Jan, Jan Knepper wrote...Jan's point is that a function call to fgetc() has a lot more overhead associated with it than incrementing a pointer. The test would be a little better balanced if you benchmarked fread() against _read(). In both the _read() and the memory mapped file case, you're reading into a buffer and (presumably) using a character pointer to examine each character in the buffer. This will always be faster than fgetc(), even if fgetc() is inlined.Heinz Saathoff wrote:I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size.How much are you reading at once? fgetc only does 1 character per call. (_)read usually does more. Calling the buffered I/O system for every single character or once for a block of 128 does make a difference.
Aug 16 2004
Hello Scott, Scott Michel wrote...That fgetc() has much overhead is true, but I wasn't sure why it's nearly a factor of 10 against my primitive buffering approach. Walter told me that fgetc has to be aware of multithreading. There will be some error handling too. All this is overhead. When I find some time I will have a look to the sources and see what happens.Jan's point is that a function call to fgetc() has a lot more overhead associated with it than incrementing a pointer. The test would be a little better balanced if you benchmarked fread() against _read().How much are you reading at once? fgetc only does 1 character per call. (_)read usually does more. Calling the buffered I/O system for every single character or once for a block of 128 does make a difference.In both the _read() and the memory mapped file case, you're reading into a buffer and (presumably) using a character pointer to examine each character in the buffer. This will always be faster than fgetc(), even if fgetc() is inlined.If fgetc() was implemented the way I did in my simple buffering file wrapper it would be as fast as my version. As you told fgetc() does more than just picking a char from a buffer and incrementing a pointer. I didn't expect this overhead in first place but now I know not to use fgetc() in timecritical applications. - Heinz
Aug 17 2004
Heinz Saathoff wrote:That fgetc() has much overhead is true, but I wasn't sure why it's nearly a factor of 10 against my primitive buffering approach. Walter told me that fgetc has to be aware of multithreading. There will be some error handling too. All this is overhead. When I find some time I will have a look to the sources and see what happens.Well, you could purchase the CD and look at the code. :-) fgetc() for 32-bit is handcoded assembly (see src/CORE32/FPUTC.ASM.) It's about as fast as you're going to get it to run. It does deal with multithreaded locking of the file descriptor, which is the place where the slowdown occurs. If you look at the code for LockSemaphoreNested, you'll see a LOCK-prefixed instruction -- this is **really** slow because it forces a lot of synchrony in the Pentium pipeline. It's the most conservative way of doing locking if you can't do self-modifying code or don't offer processor-specific versions of the RTL. Of course, you can create your own CPU-specific version of the RTL because the build system and the code is available from the CD. For example, you can call CMPXCHG or XADD instead of LOCK INC (because the lock semantics are implied.) If you're not in a SMP environment, you can simply use MOV so long as it's an aligned MOV (gauranteed atomic.)If fgetc() was implemented the way I did in my simple buffering file wrapper it would be as fast as my version. As you told fgetc() does more than just picking a char from a buffer and incrementing a pointer. I didn't expect this overhead in first place but now I know not to use fgetc() in timecritical applications.Clearly, your own optimizations work better than the generic version, which has to make few and conservative assumptions. If you're looking to be maximally portable, stick with fgetc() or use fread() if the size of the object is known. fread() will amortize the penalty of calling stdio over a larger number of bytes.
Aug 17 2004
Hello Scott, Scott Michel wrote...Heinz Saathoff wrote:I already have. That' why I have the sources.That fgetc() has much overhead is true, but I wasn't sure why it's nearly a factor of 10 against my primitive buffering approach. Walter told me that fgetc has to be aware of multithreading. There will be some error handling too. All this is overhead. When I find some time I will have a look to the sources and see what happens.Well, you could purchase the CD and look at the code. :-)fgetc() for 32-bit is handcoded assembly (see src/CORE32/FPUTC.ASM.) It's about as fast as you're going to get it to run. It does deal with multithreaded locking of the file descriptor, which is the place where the slowdown occurs. If you look at the code for LockSemaphoreNested, you'll see a LOCK-prefixed instruction -- this is **really** slow because it forces a lot of synchrony in the Pentium pipeline. It's the most conservative way of doing locking if you can't do self-modifying code or don't offer processor-specific versions of the RTL.Thank's for the hint.Of course, you can create your own CPU-specific version of the RTL because the build system and the code is available from the CD. For example, you can call CMPXCHG or XADD instead of LOCK INC (because the lock semantics are implied.) If you're not in a SMP environment, you can simply use MOV so long as it's an aligned MOV (gauranteed atomic.)It's not necessary in the moment. The app I wrote is only used by myself. Now that I'm aware of this bottleneck I can evade it.Yes, it's always good to know where the time is spent and how small changes in code can result in great performance gains. - HeinzIf fgetc() was implemented the way I did in my simple buffering file wrapper it would be as fast as my version. As you told fgetc() does more than just picking a char from a buffer and incrementing a pointer. I didn't expect this overhead in first place but now I know not to use fgetc() in timecritical applications.Clearly, your own optimizations work better than the generic version, which has to make few and conservative assumptions. If you're looking to be maximally portable, stick with fgetc() or use fread() if the size of the object is known. fread() will amortize the penalty of calling stdio over a larger number of bytes.
Aug 18 2004
Heinz Saathoff wrote:Yes, it's always good to know where the time is spent and how small changes in code can result in great performance gains.Google for the linux kernel patch that modifies the kernel at run-time to select the "right" atomic instructions depending on whether the machine is SMP and the processor model/rev. Pretty cool looking stuff, but with the XP patches that prevent modifying executable pages, I doubt this could be easily implemented in the RTL. -scooter
Aug 18 2004
Try setting the buffer size larger, not smaller, and make it a multiple of 4K. -Walter "Heinz Saathoff" <hsaat despammed.com> wrote in message news:MPG.1b868bd6fe18ecf99896e4 news.digitalmars.com...Hello Walter, Walter wrote ...mappedIt could be that the optimal buffer size you need is not the default one used by stdio.I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size."Heinz Saathoff" <hsaat despammed.com> wrote in message news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...Hello, I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memoryapproch but still very fast. Here are the times measured: stdio : 14.8 seconds _read : 1.8 seconds mmap : 0.8 seconds Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time.
Aug 13 2004
Hello Walter, The test with small buffer was to show that my simple buffering still is much faster than the stdio fgetc(). For stdio I didn't change anything. As far as I know stdio uses buffering too if not disabled. I think I will have a look at the stdio sources to find out what happens. - Heinz Walter wrote...Try setting the buffer size larger, not smaller, and make it a multiple of 4K. -Walter "Heinz Saathoff" <hsaat despammed.com> wrote in message news:MPG.1b868bd6fe18ecf99896e4 news.digitalmars.com...Hello Walter, Walter wrote ...It could be that the optimal buffer size you need is not the default one used by stdio.I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size.
Aug 16 2004
"Heinz Saathoff" <hsaat despammed.com> wrote in message news:MPG.1b8a8600757fa24d9896e6 news.digitalmars.com...Hello Walter, The test with small buffer was to show that my simple buffering still is much faster than the stdio fgetc(). For stdio I didn't change anything. As far as I know stdio uses buffering too if not disabled. I think I will have a look at the stdio sources to find out what happens. - Heinzfgetc also must do thread synchronization.
Aug 16 2004