- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 940
Add Solaris FEN using EventPorts from x/sys/unix #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the README also has a table indicating support and will need to be updated as well.
Is there a preference for keeping the changes all as a single commit, or should some of these changes be split into separate commits?
CHANGELOG.md
Outdated
@@ -1,5 +1,9 @@ | |||
# Changelog | |||
|
|||
## v1.5.0 / 2021-08-28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This date is made up
Thanks for working on this. Should #263 be closed? |
That is what I would do. While I have your attention, any thoughts on my question:
Thanks! |
7970ecf
to
7b1c530
Compare
Fewer commits (or squash merge) is nice, as you've done here. |
@nshalman Is there someone with FEN/Solaris experience that can provide your code review and approval? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the testing and found the need to switch the call in AddRaw from os.Stat
to os.Lstat
but there are two other places where I call out to os.Stat
and I'm not 100% clear on the semantics in those situations (in Remove
and at the end of handleEvent
)
I'm not totally convinced that the integration tests are exercising those code paths both with and without symlinks and might need to write some additional tests to figure out for sure if they need to be changed as well or not.
My gut says that they need to be changed too, and that the passing tests when only changing the call in AddRaw indicates a gap in the testing.
@nshalman It may be better to build this out as a separate FEN project -- e.g. I can't see myself maintaining or even doing code reviews on this -- I have enough trouble keeping up with the existing platforms. What do you think? |
Hey, I haven't had free time to work on this code myself. I've still seen odd test failures when I stress test it and I'm not sure if it's bug(s) in my code, bugs in the tests, or what. I have no objection moving the code somewhere else rather than landing it into the tree as is, but my question is how that would enable other software to consume it. Ideally existing software that uses fsnotify that currently doesn't work on Solaris and illumos would be able to start working when a new release of fsnotify comes out that uses it. Do you mean that I could keep the code in a separate package that fsnotify could pull in? Thank you for keeping this conversation alive. I'm happy to work toward whatever makes the most sense when I finally have time to pick it up again. |
I think there is the potential to eventually make fsnotify utilize an external FEN library. In the interim, people who really want Solaris support could wrap fsnotify + fen libraries in some way. For example, Hugo uses both fsnotify + polling (I admit I haven't looked too closely at how they are doing it). The idea of making fsnotify a very light wrapper around platform-specific libraries is something I've been thinking about for a long time. A large part of that is to have small teams that own each platform-specific library and can work freely on them. The cross-platform aspect makes it really tricky for contributors, so I'd like to find a way to alleviate those troubles if possible. |
At it's heart,
That sounds great, but it's not the status quo today. I think it would be unfortunate to put something that might never happen on the critical path to getting this change integrated so people can make use of it.
This would create a lot of extra technical and social work: rather than amend the cross-platform abstraction library, you are asking just one platform to go and reimplement that abstraction in any number of other bodies of consuming software. It seems like I agree that we, and you, are all already resource constrained enough, which is why people have been trying to do the work just once, in the right place: here. It isn't clear how much or what sort of additional code review is necessary before a merge. Can you outline your concerns, and what you feel remains outstanding? People in the community can likely assist with maintenance and debugging, say¸ as issues arise in future -- but it really needs to get merged so that people are actually using it in downstream software. Right now it's just sitting in a branch. |
Updated it in the main branch; didn't merge it here. So if I understand things correctly people will need a fairly recent version of illumos or it will panic? Not sure which versions of illumos distributions contain this change yet (if any?) Need to update the CI and my local test environment, as well as communicate this clearly to users. |
Right now I can find the git hash |
It's in illumos-omnios:master, which will eventually land in a bloody build:
|
And FWIW, you need a fairly recent SmartOS build for this fix:
|
It might panic the golang application (not the operating system.) For most uses cases it probably won't. It took a fair amount of stress testing for me to trip the bug.... If it does panic, the error message points explicitly at golang/go#54254 which will guide users to look into an OS update. @papertigers did ping me on IRC about the need for an update for the CI OS image. Might take a little bit for that to be ready. |
Ah right, I see your latest commit improved things a bit; I didn't look at the details before and just assumed it was 100% fixed on illumos side. I think this is mostly good to merge then, right? We still have the issue of the CI not running here; I looked in that a few weeks ago, and it's not running because I changed:
to:
It seems that it will only run "on" a push to this repo, but not to forks. Makes sense, but behaves different than I expected. These "double" CI runs were causing me real grief because GitHub macOS runners are very limited, and we're spinning off 7 of them now for the various tests; I was waiting almost an hour at times for the CI to finish, so I'd rather not "just" add it back. Even with just What we need is something like:
But I couldn't really figure out how to do that :-/ Maybe I should just change it back for now... |
Replace deprecated use of ioutil During even processing, use LStat, but Stat explicitly watched symlinks Make event bit checking more succinct
The behavior on symlinks you noted worries me; I'd prefer to get it fixed before landing if possible.
The symlink behavior quirks also remind me about #387 and all the other levels of confusion that symlinks can cause relative to fsnotify semantics... We might want to test and document our current behavior on symlinks on Linux in a bunch of different situations and make some improved test cases around them to give me more to work with while hunting it down. I was able to trivially make the tests pass by incorrectly changing the call to reassociate a path that had fired using lstat info, but I'm pretty sure that would fail a more robust test case of multiple write events needing to properly fire. I'm hoping to poke around today to see if I can figure out the correct fix and will update accordingly. |
When I remove the calls to skip the symlink tests, sometimes I see this:
|
Okay, I now understand the symlink issue (passing I have a fix in place, the test skips are removed, and the tests pass! |
I have a reproducible test failure on illumos with go 1.19 when run as root:
The test passes cleanly when run as a non-root user. |
Ah yeah, I forgot about that. So many comments and hard to keep track of stuff. I'll finish up the last few things today or tomorrow and do a new release; as far as I'm concerned it's okay to merge it after that. It doesn't need to be 100% perfect and we can create new issues and improve things before the next release. Personally I'd say that's a tad easier to keep track of things, but whatever works for you is fine with me so it's your call. |
I agree that this is in a good enough state to merge. |
@arp242 I imagine you're pretty busy, but it's been a few weeks so I figured I'd check in. Is there anything I can do to help get the next release cut so that this can be merged after that? I did fix the symlink issue, so I'm feeling very good about this code. Thanks!! |
Yeah, sorry; new job and some other issues I had to deal with 😅 I released v1.6.0 so we have a bit of time to shake out some potential issues before a new release. |
Release 1.7.0 This version of fsnotify needs Go 1.17. Additions: - illumos: add FEN backend to support illumos and Solaris. ([fsnotify#371]) - all: add `NewBufferedWatcher()` to use a buffered channel, which can be useful in cases where you can't control the kernel buffer and receive a large number of events in bursts. ([fsnotify#550], [fsnotify#572]) - all: add `AddWith()`, which is identical to `Add()` but allows passing options. ([fsnotify#521]) - windows: allow setting the ReadDirectoryChangesW() buffer size with `fsnotify.WithBufferSize()`; the default of 64K is the highest value that works on all platforms and is enough for most purposes, but in some cases a highest buffer is needed. ([fsnotify#521]) Changes and fixes: - inotify: remove watcher if a watched path is renamed ([fsnotify#518]) After a rename the reported name wasn't updated, or even an empty string. Inotify doesn't provide any good facilities to update it, so just remove the watcher. This is already how it worked on kqueue and FEN. On Windows this does work, and remains working. - windows: don't listen for file attribute changes ([fsnotify#520]) File attribute changes are sent as `FILE_ACTION_MODIFIED` by the Windows API, with no way to see if they're a file write or attribute change, so would show up as a fsnotify.Write event. This is never useful, and could result in many spurious Write events. - windows: return `ErrEventOverflow` if the buffer is full ([fsnotify#525]) Before it would merely return "short read", making it hard to detect this error. - kqueue: make sure events for all files are delivered properly when removing a watched directory ([fsnotify#526]) Previously they would get sent with `""` (empty string) or `"."` as the path name. - kqueue: don't emit spurious Create events for symbolic links ([fsnotify#524]) The link would get resolved but kqueue would "forget" it already saw the link itself, resulting on a Create for every Write event for the directory. - all: return `ErrClosed` on `Add()` when the watcher is closed ([fsnotify#516]) - other: add `Watcher.Errors` and `Watcher.Events` to the no-op `Watcher` in `backend_other.go`, making it easier to use on unsupported platforms such as WASM, AIX, etc. ([fsnotify#528]) - other: use the `backend_other.go` no-op if the `appengine` build tag is set; Google AppEngine forbids usage of the unsafe package so the inotify backend won't compile there. [fsnotify#371]: fsnotify#371 [fsnotify#516]: fsnotify#516 [fsnotify#518]: fsnotify#518 [fsnotify#520]: fsnotify#520 [fsnotify#521]: fsnotify#521 [fsnotify#524]: fsnotify#524 [fsnotify#525]: fsnotify#525 [fsnotify#526]: fsnotify#526 [fsnotify#528]: fsnotify#528 [fsnotify#537]: fsnotify#537 [fsnotify#550]: fsnotify#550 [fsnotify#572]: fsnotify#572
What does this pull request do?
This pull request adds fsnotify support for Solaris and illumos via Solaris FEN. It is a cleanup of #263 by @isaacdavis which in turn was heavily based on #196, authored by @gfrey. I have kept both of them in the AUTHORS file and added myself as well.
The biggest change from that iteration was moving all of the cgo and unsafe code out into x/sys/unix (see 324630). Additionally, in the process of that work, I wrote a wrapper for port_getn(3c) which I have used here to allow a single call to retrieve multiple events.
Supersedes #263
Fixes #12
Where should the reviewer start?
I squashed and rebased #263 and have preserved my development branch.To see how this code compares to that implementation, this URL will render the diff:
nshalman/fsnotify@rebased-solaris-fen...nshalman:illumos-fen-safer
This isn't really recognizable compared to the original any more. Hopefully
fen.go
having been stripped of all the cgois now readable without really requiring event ports / FEN expertise.
The heart of the cgo bits now live on in x/sys/unix
How should this be manually tested?
All of my testing was run on SmartOS, but the underlying event ports code includes tests that successfully ran under CI on a Solaris host. This PR also includes a GitHub action that runs an OmniOS VM to run the tests for CI.
Unfortunately, I've seen intermittent test failures and I haven't be able to fully identify what is happening with those.I believe I've finally found all the bugs. This code is sensitive to the short sleeps in some of the integration tests, and thus depends on #422.
I've been stress testing with
echo {1..72} | xargs -n 1 | xargs -t -I {} -P 24 go test -count={}
and the VM tests which used to be flaky are now reliable (and are set to run 10 times rather than just once)I have also done a cursory test of compiling hugo (v0.92.0) against this branch and running the tests and resulting binary.
Running the resulting hugo binary under truss reveals port_getn being called with arguments revealing that it's the call in fsnotify.