-
Notifications
You must be signed in to change notification settings - Fork 968
Load audio asychronously (fix #4880) #2928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I don't know about this. This makes it so the game will keep playing while the voice file is being opened, which means in degenerate cases the character might not speak for a while despite needing to. Skyrim had an annoying bug just like this; a character's supposed to say something, and sometimes they'd just stare blankly for an extra long moment before giving their line, as if an asynchronous load was being stalled. Additionally, this seems like it could cause problems with scripts, where they can call FWIW, the voice is already being streamed, which means it's not being decoded all at once (in fact, a fair bit of work went into being able to stream voices and build up the 'lip' data in small bits with each update). If the issue is specifically with the open call, it may be worth looking into what exactly is making it slow and see if that can be fixed or improved. FFmpeg is a pretty large library, so it's possible there's more efficient ways to prepare sound files for our audio decoding needs. |
Regardless of the fact that this doesn't seem to be complete, it's a very
strict taboo to access audio assets from disk on a game logic thread, so
moving it onto another thread is completely appropriate. You're going to
get random multi-second stalls on file/filesystem operations sometimes no
matter what you're doing, even if you're just setting up buffers for
streaming. There's no problem with, for example, having the game go on for
a second while an audio asset loads into cache the first time it's used, as
long as it's somehow made invisible to game logic. Or do you want to
preload every voice asset in the game?
…On Wed, Jun 24, 2020, 16:23 kcat ***@***.***> wrote:
I don't know about this. This makes it so the game will keep playing while
the voice file is being opened, which means in degenerate cases the
character might not speak for a while despite needing to. Skyrim had an
annoying bug just like this; a character's supposed to say something, and
sometimes they'd just stare blankly for an extra long moment before giving
their line, as if an asynchronous load was being stalled.
Additionally, this seems like it could cause problems with scripts, where
they can call say on a character and then check saydone before the voice
clip gets a chance to play.
FWIW, the voice is already being streamed, which means it's not being
decoded all at once (in fact, a fair bit of work went into being able to
stream voices and build up the 'lip' data in small bits with each update).
If the issue is specifically with the open call, it may be worth looking
into what exactly is making it slow and see if that can be fixed or
improved. FFmpeg is a pretty large library, so it's possible there's more
efficient ways to prepare sound files for our audio decoding needs.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2928 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEO6EGIJRJKW4KQPKELTY3RYJOFLANCNFSM4OHFWLOQ>
.
|
I can add a deadline in frame numbers like if file isn't opened in 1 to n frames, wait until it is.
I missed this but current logic is not clear. It was introduced by 4c2b694.
If it's a disk, then there is nothing to improve. |
sayActive is not supposed to be a negation of sayDone. sayDone is supposed to be a vanilla-compatible implementation of SayDone, which returns 1 during the frame a speech sound is started (i.e. no current say sound) and it's only used for that scripting function. sayActive returns the actual status of a speech sound (1 if there's a sound) used for our game mechanics. So sayDone shouldn't care if there's something in the queue. Don't "fix" it, it'll break mods. |
Checked the script in the game and found that it actually works, so I will not change it. I need to make appropriate changes to this pr now. |
Only if disk access is temporarily blocked due to other processes hammering the disk. The kernel should be expected to balance the needs of various apps to avoid resource starvation, and disk access is no exception. However, with this it's not just dependent on disk access, but also on thread scheduling and the work queue to not cause stalls.
I disagree. Having an audio cue delayed from the corresponding visual cue is quite jarring. If you move the mouse, you wouldn't want to see the game continue rendering new frames but not actually turn the camera for an extra 30+ frames afterward when a needed model gets loaded. Similarly, if you hit someone with a sword, you wouldn't want the game to continue on only to hear the hit/grunt sound a half second later after the particulars and stagger animation are finished. Sound feedback is important, and you don't want to let it lag behind the game simulation too far or go too out of sync with the visuals.
Loading sounds as needed shouldn't be a significant stall. Games were loading sounds synchronously on mechanical drives decades ago and had no problems holding 60fps. Longer sounds, like voice lines, would be streamed to avoid loading too much at once. That some people are experiencing stalls on modern SSDs from just opening a file (not even fully decoding/loading it) points to an issue being elsewhere. Throwing it onto another thread is just hiding the problem and exchanging it for others. |
A deadline would be preferable to no deadline. However, you're potentially making the problem worse since the overhead of asynchronous workloads is being added.
It depends on what the problem is. Like I mentioned, FFmpeg is a big library. It's designed to automagically handle tons of formats for many different purposes. However, what we need when opening a regular wav/mp3/ogg/etc audio file to get decoded PCM samples from is different than what we'd need for opening random video file 53 that can have multiple video and audio streams we may want to dynamically jump between during playback. When naively opening a file, ffmpeg has to be prepared for us wanting the latter, even though we really only need the former. As shots in the dark, we might be able to improve it by pre-preparing codecs and coverters that can be shared between instances, without having to search/detect and recreate them for each file. |
Is it too much overhead to pre-emptively get a handle to each audio file and maybe read the header so we've not got any of the actual file in memory, but don't need to do any setup for streaming either and can start immediately once it's time to actually play something? |
This is not actually true in practice. File access can freeze up for just about any reason, like the disk scheduler randomly deciding to go on a picnic, or the filesystem driver deciding to do an expensive journaling operation (depending on how it's designed).
A thread that's not being waited on getting stuck for a few seconds won't cause stalls, and describing it as such is misleading.
Yes, actually, I would. If the game locks while I'm turning the camera real fast because it wants to load in a model, I might get physically sick. Really, models should be loaded the moment they come into the game world, but that's not appropriate for the vast number of voice files in an RPG, and voice files don't have a physical presence in the game world, so they're not really comparable either way.
This is true for normal sound effects or for dialogue in cutscenes, but in Morrowind the typical voice line is someone making combat grunts or making a greeting, and a little delay is acceptable for both of them.
Synchronous file access has, actually, in fact, been a problematic source of hitches and stuttering in mainstream games for a long time, including in major games like Team Fortress 2. Though these days the problem with hitches is coming from shader compilation instead...
Yes. It does. But historically speaking, problems with file access randomly locking up for a few seconds even if you're just grabbing a file handle or reading a header are generally caused by the OS. Which we're not exactly going to fix. |
There only seem to be around 200MB of (non-music) audio assets in Morrowind, so this actually shouldn't be too bad. In fact, if you have a computer with enough RAM to run a modern web browser, you actually could just preload all the voice assets in the entire game, at least in their compressed form, despite me suggesting such as a joke earlier. EDIT: Just opening them as handles would be fine, but there's like 6000 of them, and OSs generally have a limit to how many file handles a process can have open at once. It looks like normal values for this limit range from around 500 to several thousand, and it depends on the particular OS etc. |
We wouldn't be able to hold open a handle to each audio file, as the OS provides only a limited number of file handles (my system only allows 1024 file descriptors, and there's over 7000 loose wav/mp3 files in a vanilla install). Though we might be able to pre-scan each one to get the codec it needs and precalculate its length, to help reduce the work ffmpeg needs to do when loading the file during play. We might also be able to limit the allowed codecs so it doesn't try to match files to dozens of completely esoteric formats nobody really uses (or at least give higher priority to ones we more expect to be used).
It would stall a sound event since it can't play until the work queue finishes the task (which in turn depends on the worker thread getting scheduled, potentially vying for a timeslice on some core that the main thread or rendering thread also wants, and not having any other tasks already in the queue it needs to wait for). And since voices aren't and can't realistically be cached, this can potentially happen every time someone speaks.
I guess we have different standards, then. If I see the game continue without doing something it should do (turn the camera, play a sound), that makes me think it's bugging out because of an error it's not reporting. If there's resource contention and the game has to pause for a quick moment to load something, I can tell it's an issue outside of the app's direct control. Neither is good, but silently delaying feedback is worse to me than an occasional very slight hitch from loading/preparing an uncached resource. |
This exact same argument can be used for the game hitching, not just voice files being delayed. I don't know about you, but I don't want to walk into a new building and get a series of seconds-long stutters because every NPC decides to greet me at the same time (actual experience, minus the stutters). |
My system is a good 10+ years old; ancient dual-core processor, mechanical HD, low memory, the whole nine yards, and the only times I'd notice a hitch with voices is if I attacked a room with several NPCs causing them all to shout in alarm at the same exact time. During normal play where only a single character would start speaking on a given frame, it was completely seamless. However, I'd not be surprised if this causes problems for my system since there's only so much processor time to go around, meaning the work queue is inherently more flaky when it comes to completing what would be very quick work in a reasonable amount of time; freeing up memory to avoid swap or stopping background processes that want to use the disk won't be able to help since the problem won't be the disk. |
I gathered some profile for openmw + ffmpeg 4.2.3 using systemtap with script. There are 2 kinds of issues:
Top calls by sum duration:
Stack:
Top calls by sum duration:
Stack:
|
7a1d265
to
5777737
Compare
Want to make music decoding async too. |
To clarify, this PR doesn't make voices decode asynchronously. They're already streamed with a background thread. All this PR does is open the file asynchronously, then start a streaming sound.
It already is. |
3337f4e
to
45b5422
Compare
Ok, you're right. I mean create a decoder and open file. It's not full decoding but it's a part of a decoding process. Changed all the naming in the pr to reflect this. Sounds are left now. |
cd6187c
to
ce7a915
Compare
Honestly, if the performance of opening audio files is a real problem, it would be best to have a generic solution. As it is, sound effects are going to be loaded about as often as voices (any time a sound is first used, or if it's been purged from cache due to lack of use), which are opened and decoded synchronously.
It you really need to, though, a more general solution may be to handle audio similar to graphics. Put all the stuff with openal and ffmpeg on a separate thread, separate from the main simulation, and have the main thread just supply the updates for a given frame. A fence of some sort can be used to ensure a frame has been processed before handling the next frame, preventing it from lagging too far behind. |
Essentially preloading (priming) assets before need them in other words? We do this already with meshes/textures (preload in settings). |
Sort of. My original thought was to go through the audio files at load time and have ffmpeg detect what codec it would use, and store that information with the sound definition. Then when it needs to be loaded, we could tell ffmpeg to use that codec (or prioritize that codec) so it doesn't have to probe the file during runtime. We could also look into seeing if calling
indicating they know it's a time-consuming call. |
Looking at this sample, I agree. Top calls by duration:
Top calls by sum duration:
Stack:
|
Has anyone tried building FFMPEG with the necessary options to make it profilable and then working out what it's actually spending time on? |
I have. This comment contains some info. I've tried to profile with read syscall. Here is one slow frame sample: https://gist.github.com/elsid/326cae87b36b59e03098897600ce3b14 . The interesting part is this subtree:
And some samples of slow file reads:
|
Are there plans to continue development, or this PR can be closed? |
If https://gitlab.com/OpenMW/openmw/-/merge_requests/520 will be merged I can continue to work on this. |
Mentioned MR was merged to master. |
It's used only there.
Volume factor is passed as an argument when engine logic requires to play sound. SFX volume is loaded from ESM file. These values are not available at the same time and can change independently. With asynchronous sound loading SFX volume becomes available when access to the volume factor argument is lost therefore need to keep it as a separate field.
To avoid initial file read from disk in the main thread. Use deadline to avoid noticable delay in actors speech start. This changed affects only voices and music. Sounds are not affected by this change. Decoder initialization requires to open file to read headers that leads to read from disk. This operation is performed by a separated thread via SceneUtil::WorkQueue. Created and initialized decoders are stored into a map file name -> decoder. Other data required to play voice stored into SoundManager state and used each update from the main thread to try to start playing voices and music when corresponding decoder is available. If it's not available play is delayed until next frame. When deadline to create decoder is reached main thread stops and waits until that decoder is created.
To avoid initial file read from disk in the main thread. Use deadline to avoid noticable delay in actors speech start. This changed affects only sounds. Sounds are played more often than voices and music. There is a pool of created and initialized sound buffers that allows to play it from memory. When sound buffer initially requested openwm has to created decoder, read headers and potentially read all data. Initial request for sound buffer is performed in a separate thread via SceneUtil::WorkQueue. Created buffers are stored into a pool. If required sound buffer is present in the pool separate thread is not used. Other data required to play sound is stored SoundManager state and used each update from the main thread to try to start playing sounds when corresponding sound buffer is available. If it's not available play is delayed until the next frame. When deadline to load sound is reached main thread stops and waits until sound is loaded and added into the pool.
Rebased on master and includes https://gitlab.com/OpenMW/openmw/-/merge_requests/567 . Two functiional changes from the previous state:
I think current state is good enough to be merged. Still I expect review and testing. If you want to get an overview what this changed does please read commit messages. I've tried to explain in some details the context and what change is about. |
class Sound; | ||
class Stream; | ||
|
||
class OpenAL_Output : public Sound_Output | ||
{ | ||
const DecoderProvider* mDecoderProvider; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More a stylistic thing that doesn't really matter too much, but this can be a reference instead of a pointer. It would avoid accidental changes, and make it clear that it's set once on initialization and never changed.
@@ -20,6 +25,7 @@ namespace MWSound | |||
class OpenAL_Output : public Sound_Output | |||
{ | |||
const DecoderProvider* mDecoderProvider; | |||
const VFS::Manager* mVFS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant since the decoder also has it. It also creates an implicit dependency of sorts between the output and the decoder... the output checks its VFS to see if the file exists, then passes that filename to the decoder expecting it to open with its VFS. Not that they're expected to be different, but it's something that can trip up someone casually looking through the code.
@@ -109,7 +109,7 @@ namespace MWSound | |||
|
|||
SoundManager::~SoundManager() | |||
{ | |||
clear(); | |||
SoundManager::clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking the SoundManager
class as final
would also avoid a virtual dispatch, both here and elsewhere that it calls its own virtual methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like this. It makes the code a lot more complicated and harder to follow, making it do more work overall, and introduces potential undesirable latency issues (which can only be avoided by making the code more complicated, causing potentially worse stalls than it was trying to avoid in the first place). On top of the increased potential for bugs to seep in. Weren't there concerns that it wasn't safe to use the work queue from here like this?
I would much rather explore other options for making opening audio files more efficient, including trying to make ffmpeg not aggressively scan files for stream information, better hint at it the necessary codecs, using a more lightweight library like libsndfile, or even putting all audio work onto its own dedicated worker thread, like OSG does for OpenGL.
It's just an opinion.
It does make code more complicated. Like with asynchronous physics in this project or anywhere where you introduce asynchronicity. There are no any additional latency issues to what we have now.
There are no concerns. Work queue is just a thread with a queue. Please stop repeating this groundless idea.
Only putting every audio file into a memory on the game start will allow to avoid IO related latency during game session. Who will do it doesn't matter: OS, openmw, or some library. Putting all audio work into it's own thread will require a lot more code and architecture changes. And all operations with sound will become asynchronous because caller will post a request to play sound, to stop sound, etc. This will create much more complicated client code. OSG is a bad example here, because it prepares a frame in a separate thread but engine doesn't start a new one util work with previous is done. With audio you can only expect that IO will finish in some amount of frames. You must not synchronize on it. |
Yes there are. As it is now, the sound manager calls to the decoder to load a sound. The decoder opens the file and loads it into a buffer, waiting only the amount of time it needs to for those operations to complete, before it can start playing. With it being asynchronous, the sound manager sends off a work queue item to have the decoder open and load the file. That work queue item then has to wait until the work queue wakes up and gets to that task, doing it and then marks it complete. Then the next time the sound manager updates, it checks to see that the load is complete and can play. So rather than it decoding and start playing ASAP, the decoding is going to be delayed by however long it takes the work queue to deal with it, and it will wait to start playing until the next time the sound manager updates after it completed loading. You'd be better off saying there'd be no appreciable latency issues, since these added delays would ideally be very short. However, I'm not sure that's a safe guarantee. And given a cap on the time the main simulation will wait, any significant delays caused by the work queue would make loading stutters worse.
But as your previous posts show, that's not the actual problem here. The majority of the stall comes from
Maybe, but it would be more straightforward and logical. There's less back-and-forth between the frontend and backend as the frontend merely tells the backend what to do, like play a sound or move the sound position, and the backend handles it all from there. The frontend doesn't need to hear back that the backend has done anything. At most it will query the backend for the state of active objects, e.g. whether a sound is still playing or the current playback offset of a sound. It puts more work in the backend to be able to stage changes and have a background thread to process them, but to the frontend, the behavior is the same since there's little feedback necessary. I'm not saying I'm terribly keen on the idea, but the conceptual design is easier to understand and work with, and less invasive. Yes, it would take a fair amount of work, and is part of why I'd check other simpler options first, like trying to find ways to make ffmpeg more efficient with open files (by more strongly indicating the format, or making it not probe the whole file for every bit of information) or using a different more lightweight decoder. |
In the best case scenario playback starts on a next frame. That is ok because on that frame picture changes to represent what's happened in the game logic. For example when actor has to speak on the same frame voice is requested to be played and actor animation starts. On the next frame player sees changed image with started animation. Or maybe even one frame after if there is some kind of buffer. There is a case when it's not a deadline but more than one frame passed since playback requested. Some additional latency may happen but it is less than a frame duration that is not noticable by human at 60 FPS. The worst case is when deadline happens. There is no additional latency because SoundManager waits until task is complete and right after starts playback. It's unlikely for this pr to be merged in the current state or in any state that will use the same approach. And another one requires a fresh start so I don't see a point to keep this open. |
The idea is to use background thread to open voice files. The problem first was discovered in https://gitlab.com/OpenMW/openmw/-/issues/4880 . My research shows this as the only thing that may have significant impact on a single frame time in mechanics manager. This graph shows frame time for each frame before this change:

There are multiple one frame duration spikes. This happens on SSD. Digging into the problem shows the same reason as bug 4880.
After the change for the same test graph doesn't have this kind of spikes:
