Skip to content

I found excelize(golang) is 2~3 times slower than js-xlsx #439

@xtutu

Description

@xtutu

Platform: Windows 10
go version go1.12.4 windows/amd64
node version v8.11.1


image


Maybe there is some problem with my usage?

code is here:

package main
import (
	"fmt"
	"time"
    "github.com/360EntSecGroup-Skylar/excelize"
)
func main() {
	excelPath := "data.xlsx"

	timeStart := time.Now().UTC().UnixNano() / 1e6
	times := 5
	for i :=0 ; i < times; i++ {
		f, _ := excelize.OpenFile(excelPath)
		_, _ = f.GetRows("main")
	}

	timeFinished := time.Now().UTC().UnixNano() / 1e6
	fmt.Println(timeFinished - timeStart)
}
import * as XLSX from 'xlsx';
function run() {
  let currentMilliSec = Date.now()
  let times = 5

  let excelPath = "data.xlsx"
  for (let i = 0; i < times; i++) {
    var workbook = XLSX.readFile(excelPath);
    let sheet = workbook.Sheets["main"];

    // let ref: string = sheet['!ref'];
    // console.log(ref);
    // let xx = ref.split(':')
    // console.log(xx[0], xx[1]);
    // let start = XLSX.utils.decode_cell(xx[0]);
    // console.log(start.c, start.r, XLSX.utils.encode_cell(start));
    // let end = XLSX.utils.decode_cell(xx[1]);
    // console.log(end.c, end.r, XLSX.utils.encode_cell(end));

    // let starRowIdx = 0
    // for (let r = 0; r <= end.r; r++) {
    //   let cellName = XLSX.utils.encode_cell({ c: 0, r: r });
    //   let cell = sheet[cellName]
    // }
  }

  let after = Date.now()
  console.log("js cost: ", after - currentMilliSec, "ms")
}

run()

Activity

mlh758

mlh758 commented on Jul 20, 2019

@mlh758
Contributor

Does worksbook.Sheets["main"] actually parse and read each row? GetRows does, it looks like you have the actual reading code commented out on the JS tests there.

xtutu

xtutu commented on Jul 22, 2019

@xtutu
Author

Does worksbook.Sheets["main"] actually parse and read each row?
Yes.

Actually

var workbook = XLSX.readFile(excelPath); //  this code is already finished parse.

Because even I uncomment the original code, Time Cost Not Changed !

@mlh758

mlh758

mlh758 commented on Jul 23, 2019

@mlh758
Contributor

I’ll have to try the benchmarks out. We’ve had a few passes to improve the performance of the library but there is probably plenty of work that could be done as a community. Using Go benchmarks and the profiler could help highlight the slowest areas.

Also, many Node libraries are hooked up to some pretty efficient C libraries under the hood.

mlh758

mlh758 commented on Jul 31, 2019

@mlh758
Contributor

I had a pretty big excel file on hand so I went ahead and tried things out, here are my benchmark files:

package main

import (
	"fmt"
	"time"

	"github.com/360EntSecGroup-Skylar/excelize"
)

func main() {
	excelPath := "data.xlsx"

	timeStart := time.Now()
	times := 5
	for i := 0; i < times; i++ {
		f, _ := excelize.OpenFile(excelPath)
		rows, _ := f.Rows("DTAs")
		for rows.Next() {
			row := rows.Columns()
			for i := range row {
				if i >= 0 {
					continue
				}
			}
		}
	}

	fmt.Println(time.Since(timeStart).Seconds())
}
const XLSX = require('xlsx');

function run() {
  let currentMilliSec = Date.now()
  let times = 5

  let excelPath = "data.xlsx"
  for (let i = 0; i < times; i++) {
    var workbook = XLSX.readFile(excelPath);
    let sheet = workbook.Sheets["DTAs"];
  }

  let after = Date.now()
  console.log("js cost: ", after - currentMilliSec, "ms")
}

run()

The go example ran in 17 seconds, the js example ran in 20 seconds, which puts them at roughly even. If I use GetRows like you did in your benchmarks the Go code runs up to 25 seconds. Seems like Rows is much more efficient to use.

mlh758

mlh758 commented on Jul 31, 2019

@mlh758
Contributor

I just noticed I ran the benchmarks against 1.4.1. If I switch the import to "github.com/360EntSecGroup-Skylar/excelize/v2" and run against 2.0.1 the times don't change much though.

mlh758

mlh758 commented on Jul 31, 2019

@mlh758
Contributor

For fun I moved this to a benchmark:

package main

import (
	"testing"

	"github.com/360EntSecGroup-Skylar/excelize/v2"
)

const excelPath = "data.xlsx"

func BenchmarkMain(b *testing.B) {
	for i := 0; i < b.N; i++ {
		f, _ := excelize.OpenFile(excelPath)
		rows, _ := f.Rows("DTAs")
		for rows.Next() {
			row, _ := rows.Columns()
			for i := range row {
				if i >= 0 {
					continue
				}
			}
		}
	}
}

Then ran it with the profilers enabled:

go test -bench=BenchmarkMain -benchmem -memprofile memprofile.out -cpuprofile profile.out

CPU profile results:

Showing top 10 nodes out of 177
      flat  flat%   sum%        cum   cum%
     720ms 19.51% 19.51%      720ms 19.51%  runtime.memmove
     290ms  7.86% 27.37%      290ms  7.86%  runtime.(*mspan).refillAllocCache
     230ms  6.23% 33.60%      230ms  6.23%  runtime.(*mspan).init
     160ms  4.34% 37.94%      350ms  9.49%  encoding/xml.(*printer).marshalAttr
     160ms  4.34% 42.28%      160ms  4.34%  runtime.memclrNoHeapPointers
     120ms  3.25% 45.53%     2210ms 59.89%  encoding/xml.(*Decoder).unmarshal
     110ms  2.98% 48.51%      200ms  5.42%  runtime.scanobject
     100ms  2.71% 51.22%     1120ms 30.35%  runtime.mallocgc
      90ms  2.44% 53.66%      150ms  4.07%  encoding/xml.(*Decoder).getc
      80ms  2.17% 55.83%       80ms  2.17%  runtime.bulkBarrierPreWrite

Memory:

Showing top 10 nodes out of 56
      flat  flat%   sum%        cum   cum%
  247.02MB 20.59% 20.59%   252.52MB 21.05%  encoding/xml.(*Decoder).rawToken
  198.97MB 16.59% 37.18%   198.97MB 16.59%  reflect.unsafe_NewArray
  123.88MB 10.33% 47.50%   123.88MB 10.33%  bytes.makeSlice
  106.01MB  8.84% 56.34%   138.01MB 11.50%  encoding/xml.(*printer).marshalAttr
   92.50MB  7.71% 64.05%   345.03MB 28.76%  encoding/xml.(*Decoder).Token
   83.51MB  6.96% 71.01%    83.51MB  6.96%  reflect.Zero
   72.56MB  6.05% 77.06%    72.56MB  6.05%  bytes.Replace
   68.20MB  5.69% 82.74%    84.20MB  7.02%  github.com/360EntSecGroup-Skylar/excelize/v2.checkRow
   64.50MB  5.38% 88.12%   729.01MB 60.77%  encoding/xml.(*Decoder).unmarshal
      33MB  2.75% 90.87%       33MB  2.75%  reflect.packEface

It looks like the vast majority of our time is spent the standard library's xml decoder.

mlh758

mlh758 commented on Jul 31, 2019

@mlh758
Contributor

cpu

For visual reference

mlh758

mlh758 commented on Aug 1, 2019

@mlh758
Contributor

@xtutu sent me the data file he was using for testing. When I use Rows() on it I get a segfault from rows.go line 106. Not sure what's going on there yet, but we almost certainly have a bug.

Edit: Nope, typo on my part. Wasn't checking the errors since it was a benchmark. Library is fine.

added a commit that references this issue on Aug 5, 2019
451e6e3
added 3 commits that reference this issue on Oct 23, 2020
04960ce
be2933b
6aeee6d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mlh758@xtutu

        Issue actions

          I found excelize(golang) is 2~3 times slower than js-xlsx · Issue #439 · qax-os/excelize