Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using/doc issues #424

Open
clach04 opened this issue Nov 14, 2022 · 1 comment
Open

using/doc issues #424

clach04 opened this issue Nov 14, 2022 · 1 comment

Comments

@clach04
Copy link
Contributor

clach04 commented Nov 14, 2022

This maybe a ticket that needs to be broken into many tickets, but thought easier to open new as needed rather than a bunch.

  • support for text file is not documented - I was surprised to see txt files copied and then modified. This actually fails when the files are not utf8 and/or the locale for the system does not match utf8 as the json payload added to the head won't match. Hack below so as to get it working on my machine. I'm still unclear on the why here (I suspect back tracking support)
    • Some Linux installs default to 7-bit us-ascii
    • Many Windows installation in US and Western Europe default to
  • Error count at end of import is confusing. In my case I'm pretty confident this is a "skip" count, where the files where skipped due to already existing (i.e. duplicate avoidance).

text file hack

Superceded by #441 to resolve #440

Basically preserve existing data (in what ever encoding its in) but add a utf8 encoded first line with a newline

$ git diff elodie/media/text.py
diff --git a/elodie/media/text.py b/elodie/media/text.py
index 4e3c6bb..54b9d33 100644
--- a/elodie/media/text.py
+++ b/elodie/media/text.py
@@ -145,8 +145,15 @@ class Text(Base):
         if source is None:
             return None

+        # FIXME  / TODO document why this is being done. A *.txt file is being opened BUT only the first line is read and then assumed to be a complete valid payload? Why do the IO, what purpose does this serve? why possible sort of usefulness does this offer?
         with open(source, 'r') as f:
-            first_line = f.readline().strip()
+            #first_line = f.readline().strip()
+            try:
+                first_line = f.readline().strip()
+            except UnicodeDecodeError:
+                print('file %r UnicodeDecodeError' % source)
+                #raise
+                return None  # seems to be in keeping with other exit points

         try:
             parsed_json = loads(first_line)
@@ -191,6 +198,9 @@ class Text(Base):
                     copyfileobj(f_read, f_write)
         else:
             # Prepend the metadata to the file
+            #print('DEBUG %r' % metadata_as_json)
+            #print('DEBUG %r' % type(metadata_as_json))
+            """
             with open(source, 'r') as f_read:
                 original_contents = f_read.read()
                 with open(source, 'w') as f_write:
@@ -198,6 +208,13 @@ class Text(Base):
                         metadata_as_json,
                         original_contents)
                     )
+            """
+            with open(source, 'rb') as f_read:
+                original_contents = f_read.read()
+                with open(source, 'wb') as f_write:
+                    f_write.write(metadata_as_json.encode('utf8'))  # write first line json (utf-8 encoded) header
+                    f_write.write(original_contents)  # what ever format was already there
+

error report doc comments

$ git diff elodie/result.py
diff --git a/elodie/result.py b/elodie/result.py
index 3fa7851..650411d 100644
--- a/elodie/result.py
+++ b/elodie/result.py
@@ -15,7 +15,7 @@ class Result(object):
         if status:
             self.success += 1
         else:
-            self.error += 1
+            self.error += 1  # which may simple mean skipped, not an actual error!
             self.error_items.append(id)

     def write(self):
@@ -32,7 +32,7 @@ class Result(object):
         headers = ["Metric", "Count"]
         result = [
                     ["Success", self.success],
-                    ["Error", self.error],
+                    ["Error", self.error],  # which may simple mean skipped, not an actual error!
                  ]

         print("****** SUMMARY ******")
@clach04
Copy link
Contributor Author

clach04 commented Nov 14, 2022

BTW thanks for making this available. It's implemented a bunch of stuff I don't want to implement myself so saved me a bunch of work (even with the current reverse geocode issue with MapQuest API) :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant