efficiently parsing org files
if you just want the code snippet you may scroll down.
the issue isnt elisp, its not
this delay is also present in other functions such as
notice however that this delay is only there when the file isnt already open in emacs, so subsequent
the delay on my machine seems to be consistently around 0.2 seconds. for my 2k org files this means a total time of 400 seconds to read the files, which isnt ideal when you want to fire up some menu that needs to query metadata from all of your files.
this delay however is compounded even further when we try to open an org-mode file (probably because org-mode startup is slow too.)
this takes much less than the previous functions.
it takes 3-4 seconds. the results can be reduced even further (to a mere 0.3 seconds for all files) if we tell
then simply running
introduction
it has always bothered me thatorg-agenda takes quite some time to load up even when org-agenda-files contains very few files. currently on my machine it takes a noticeable delay of 7 seconds to load ~10 files. and i am on emacs 31.0.50 with native compilation enabled. this isnt specific to org-agenda either. when i was using org-roam it took a few minutes to parse my 2k+ org files. org-roam then caches its results to avoid this delay each time its queried, but this isnt a real solution.
the issue isnt elisp, its not
org-element either (the parser for org's syntax). i figured this out after messing around. this page highlights the bottlenecks and how to work around them. but unfortunately it doesnt provide a solution for org-agenda because i didnt put in the time to try and make it work. it mainly provides a way to query a bunch of org files for metadata (or for the full syntax trees fwiw).
the bottleneck
find-file itself has a delay even for files that dont require much work to open or even display. this small delay stacks up and turns from a fraction of a second into minutes when we try to parse a thousand org files.
(benchmark-run (find-file user-init-file))
(0.187511496 0 0.0)
this delay is also present in other functions such as
find-file-noselect which are used internally to open org files for org-agenda or other modules.
(benchmark-run (find-file-noselect user-init-file))
(0.212753316 0 0.0)
notice however that this delay is only there when the file isnt already open in emacs, so subsequent
find-file calls to the same path will not be as bad. but that is not the case when we need to open some org files for org-agenda.
the delay on my machine seems to be consistently around 0.2 seconds. for my 2k org files this means a total time of 400 seconds to read the files, which isnt ideal when you want to fire up some menu that needs to query metadata from all of your files.
this delay however is compounded even further when we try to open an org-mode file (probably because org-mode startup is slow too.)
(benchmark-run (find-file-noselect "/home/mahmooz/brain/notes/1707069432.org"))
(0.361507639 0 0.0)
the workaround
the workaround is not to usefind-file directly but to make use of insert-file-contents and run the org-element parser directly on the contents. for example:
(benchmark-run
(with-temp-buffer
;; the final `t' is to replace contents of the temp buffer
(insert-file-contents "/home/mahmooz/brain/notes/1707069432.org"
nil
nil
nil
t)
(let ((major-mode 'org-mode))
(org-element-parse-buffer))
nil))
(0.003604351 0 0.0)
this takes much less than the previous functions.
the final snippet
with the help of https://karthinks.com/ i had arrived at this final snippet. it increases the garbage-collection threshold to make the garbage collector run less often. (defun map-org-files (files func)
(let ((org-inhibit-startup t)
(org-element-cache-persistent)
(org-element-use-cache)
(org-mode-hook)
(files (if (atom files) (list files) files))
(gc-cons-threshold 100000000) ;; 100mb
(coding-system-for-read 'utf-8))
(with-temp-buffer
(buffer-disable-undo)
(mapcar
(lambda (orgfile)
(insert-file-contents orgfile nil nil nil t)
(let ((major-mode 'org-mode))
(funcall func orgfile)))
files))))benchmark
running this on my 2.5k org files which contain >100k lines of text: (benchmark-run
(progn
(map-org-files
(directory-files "/home/mahmooz/brain/notes/" t ".org")
(lambda (orgfile) (org-element-parse-buffer)))
nil))
(3.640452928 3 0.5723500680000022)
it takes 3-4 seconds. the results can be reduced even further (to a mere 0.3 seconds for all files) if we tell
org-element to only parse as deep as headlines:
(benchmark-run
(progn
(map-org-files
(directory-files "/home/mahmooz/brain/notes/" t ".org")
(lambda (orgfile) (org-element-parse-buffer 'headline)))
nil))
(0.355560082 0 0.0)
replicating ~org-roam-open~
replicating something likeorg-roam-open can then be as simple as:
(defvar *my-org-dir*
"/home/mahmooz/brain/notes/")
;; function to read a top-level org keyword
(defun org-get-keyword (kw)
(let ((value (cadar (org-collect-keywords (list kw)))))
value))
(defun my-org-open ()
(interactive)
(let* ((files (directory-files *my-org-dir* t ".org"))
(titles (map-org-files
files
(lambda (orgfile)
(org-get-keyword "title"))))
(choice (completing-read "file: " titles))
(choice-idx (cl-position choice titles :test 'string=)))
(when choice
(find-file (elt files choice-idx)))))my-org-open will give us a list of titles and choosing one will open the corresponding org-mode file.
more example usage
;; grab titles
(map-org-files
(directory-files "/home/mahmooz/brain/notes/" t ".org")
(lambda (orgfile) (org-get-title)))
;; grab other keywords from files (the function org-get-keyword id defined above)
(map-org-files
(directory-files "/home/mahmooz/brain/notes/" t ".org")
(lambda (orgfile)
(list :date (org-get-keyword "date")
:file orgfile
:id (org-id-get))))
;; grab AST's
(map-org-files
(directory-files "/home/mahmooz/brain/notes/" t ".org")
(lambda (orgfile) (org-element-parse-buffer)))