Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gibberish from longer context #7056

Closed
Drael64 opened this issue May 3, 2024 · 15 comments
Closed

Gibberish from longer context #7056

Drael64 opened this issue May 3, 2024 · 15 comments

Comments

@Drael64
Copy link

Drael64 commented May 3, 2024

I first encountered this problem after upgrading to the latest llamaccp in silly tavern. It would generate gibberish no matter what model or settings I used, including models that used to work (like mistral based models). It was confusing because the models generate normally in kobold lite. I thought it was a SillyTavern problem.

Then I pasted a purely coherent text long chat into the prompt in kobold lite (over 6k tokens), and it gave gibberish like "(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr" or meaningless strings of characters and words just like the bug in SillyTavern, and likewise if I deleted the character card, authors note, and strong string from SillyTavern it would generate coherently and normally.

So this appears to be llamacpp and something to do with a fuller or longer context.

I'm using vulkan with fairly standard settings on windows 11. Doesn't matter what kind of base model I use, or any other settings. Basically everything that previously worked before I updated, no longer works (unless I trim the context down to nothing, but that makes it pretty useless). I use a max context size of 6144 if that matters, so it's never larger than that.

@JohannesGaessler
Copy link
Collaborator

Are you getting a big scary warning about degraded outputs on console?

@Drael64
Copy link
Author

Drael64 commented May 3, 2024

Are you getting a big scary warning about degraded outputs on console?

No, nothing like that.

Actually this is really confusing, because I can't seem to get kobold lite to generate coherent text even with "Hi" as context, now using llama-3, but can with mistral (and can with low context in ST)

Perhaps the Llama-3 stuff is because I need updated GGUFs. But doesn't explain why Mistral or Solar is suddenly breaking at higher context sizes - context sizes it used to work fine with. At least I'll try a newer quant of llama-3 to see if that helps. Even so the gibberish from other models like my solar fine tune at contexts that used to work is baffling.

@Drael64
Copy link
Author

Drael64 commented May 3, 2024

Okay, can confirm that newer GGUFs work for llama-3 for me. But oddly my old Solar model finetune GGUF that worked perfectly before does not. Has this new release somehow changed the tokenized even for that? Do I need to re-do the GGUF files for my 11B solar fine tunes?

@JohannesGaessler
Copy link
Collaborator

You didn't say how old the version you used was and even then I probably won't have a perfect overview of all changes that happened. But I would suspect that things will work correctly if you regenerate the GGUF files from the original weights.

@Drael64
Copy link
Author

Drael64 commented May 3, 2024

You didn't say how old the version you used was and even then I probably won't have a perfect overview of all changes that happened. But I would suspect that things will work correctly if you regenerate the GGUF files from the original weights.

I mean I wouldn't know. Somewhere between Mistral and now. I was talking to some other people on Discord and they seemed to be saying that all old GGUF files regardless of model are now broken, so that'll be what I try next, regenerating the models. Wild if true, just think of how much space is now wasted/broken on HF!

Anyway, I'll try to get back in a day or two on this, but you can close it if you want.

@navr32
Copy link

navr32 commented May 3, 2024

hi !
I have gibberish too with vulkan backend too. Even if i load other models. Just ask some 1 to 3 or sometimes 4 questions and start to gibberish without stop untils i stop with keyboard "ctrl c".
In the example below just one ask ..the model start to reply and become gibberish.
Thanks to all.


 ./main -ngl 41 -cml -t 14  -m /llama-2-13b-chat.Q8_0.gguf --log-enable
Log start
main: build = 2781 (6ecf3189)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed  = 1714773679
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from llama-2-13b-chat.Q8_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   4:                          llama.block_count u32              = 40
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 13824
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 40
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 40
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0,000010
llama_model_loader: - kv  10:                          general.file_type u32              = 7
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0,000000, 0,000000, 0,000000, 0,0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q8_0:  282 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 40
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 5120
llm_load_print_meta: n_embd_v_gqa     = 5120
llm_load_print_meta: f_norm_eps       = 0,0e+00
llm_load_print_meta: f_norm_rms_eps   = 1,0e-05
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: f_logit_scale    = 0,0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000,0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 13B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 13,02 B
llm_load_print_meta: model size       = 12,88 GiB (8,50 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon RX 7900 XTX | uma: 0 | fp16: 1 | warp size: 64
llm_load_tensors: ggml ctx size =    0,37 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:        CPU buffer size =   166,02 MiB
llm_load_tensors:    Vulkan0 buffer size = 13023,85 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:    Vulkan0 KV buffer size =   400,00 MiB
llama_new_context_with_model: KV self size  =  400,00 MiB, K (f16):  200,00 MiB, V (f16):  200,00 MiB
llama_new_context_with_model: Vulkan_Host  output buffer size =     0,12 MiB
llama_new_context_with_model:    Vulkan0 compute buffer size =    85,00 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =    11,01 MiB
llama_new_context_with_model: graph nodes  = 1286
llama_new_context_with_model: graph splits = 2

system_info: n_threads = 14 / 24 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
main: interactive mode on.
Reverse prompt: '<|im_start|>user
'
sampling: 
	repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000
	top_k = 40, tfs_z = 1,000, top_p = 0,950, min_p = 0,050, typical_p = 1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 17


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

<s> <|im_start|>system
<|im_end|>
> Speak me about wavelength ?
 Wavelength is a fundamental concept in physics that refers to the distance between two successive peaks or troughs of a wave. It is usually measured in meters or nanometers, and it determines the size of the wave.

In the context of light, wavelength is related to the color of the light. Different wavelengths of light correspond to different colors, with shorter wavelengths appearing blue and longer wavelengths appearing red. For example, visible light has wavelengths between approximately 400 and 700 nanometers, which corresponds to the colors of the visible spectrum, from violet to red.

Wavelength is also important in other fields, such as audio, where it determines the pitch of a sound, and in oceanography, where it is used to describe the size of ocean waves.

Is there anything else you would like to know about wavelength ?

álvaПетерpplyПетерischofomas welcomeetten断cher halERRolacherITH reciERRJECT CavalcitaLENGola断pperscherischofITH断 icJECT hal CavalJECT welcomecitaettenLENGПетерischofJECT ProofomasFILESpplyischof icuntoatore MAR icomasálvacher CavalERR Cavaletten halimasischofoluoluolu halcher hal断omasahlola断ERRombres marcahlcheromas ic Хронологија halomasálvaJECTERR断ettenettencherischofizersuntoettenischofizerscherERRolaПетер Stevenspply Cavalomas welcomepply Stevens halimas断LENGettenJECTLENGTH Stevens marc welcomeischofischofuntoПетер断 icJECT icischofcitaolapplyimasПетер断LENG MARimas Caval ProofcherERR Caval hal Caval marcischofJECT welcomeoluLENGTHolu StevensolacitaJECT Хронологија reci MARERRITHJECTITHppersettenLENGTH ProofПетерПетер iccita断 welcome断olupply recipply Cavaletten reciálva welcome断ERRppersimas welcomeppersoluombresimasálvaischofLENGTH reciálvaischof Caval halizers marcERRindre reci断pplyJECToluizers Хронологијаimasombres welcome StevensuntoahlITH reciJECTJECT CavalERRindreLENGTHITHizersischof haloluLENGTH断álvaomasimasischoficoscherimasoluischofppersindreola MAR CavalindreischofERRizersLENGTHicosJECTterraLENGTHetten Stevensolaálvaolaischof断 Хронологијаizersomasicos CavalombresITH MARombres icolaomasizers reciizers reci MAR hal marc ProofcitaizersombrescherizersITHuntountochercher CavalischofischofatoreizersFILESПетерizersomasettenFILESettencher reci recioluПетер断ERRpplyJECTimasFILESola断pplyizersomasindre断omas断oluoluettenizers断unto reciischof CavalLENGERRJECTLENGTH CavalterraLENGischofizers断LENGTH断断 ProofFILESálvaERR StevensERRПетерLENGcherpply icITHJECTischofombres marcischof MARcitaizersizerspply断断ettenatore CavalLENGПетерoluombresПетерoluppers MARcherimas Proof icterraahl CavalERRcher Cavalppersppers welcome hal Caval MARimasolu hal MARahl MARomas Хронологијаppersombrescheratore welcomeicosettenomasomasERR StevensimasettenJECT StevensindreizersuntoischofLENGálvaicosischofuntoITH icFILES断imasombres Proofunto welcomeppers marc Stevens断 CavalFILES icola reci Cavalicoscita Хронологијаomas welcomecitaFILES icolaLENG marc welcomecitaischof ProofcherFILESERRatoreomas marcolacher Stevens Stevensombresterracher halERR StevensettenizersLENGTHettenuntoFILESombresischofomasizersoluolaombresizersombresindre hal MARindrecher reci marc welcome CavalITHicos marccher断LENGischofПетер Caval MARizers断 halERRicos icERRLENGTHombresischofПетерicoscherterraJECTJECTJECTizersppersombresálvaJECT reci hal CavalizersLENGischofITHettenПетерpplyLENGTH ProofLENGcherERRicosERRischofПетерLENGTHuntountoJECT hal reciolu icola halolaoluLENGTHcher Cavalterraomasimasolu ProofLENGTHuntoálvaJECTolu MARischofcita marcicosolupplyischofITHindreERRizers Stevensischof Хронологија Proof断olaizersettenLENGetten Хронологија Proof Proofterraombresicosettenettenolu marcERRERRERRomasERRizersischof MARoluischofERRunto halterraITHálva断断ITHERR icppers断ERRindre halombres Хронологија halERRLENGettenterra断FILESolu halombresoluppers marcola Cavalunto halERRJECTcitaischofischof StevensppersLENGomasITH MARLENGcher Proof断 ic断断ahlicos CavalischofischofПетерálvaomasomascher hal reciimasischof marcischofindreLENGTH welcomeicos ProofLENGTH welcomeizerspply Caval Stevensomasicos reci CavalischofПетерcher Proof断 ic reciizersolaLENGischof Stevensicosomascitaetten welcome MARomasLENGTHERRetten断icosJECTola marcoluetten icpplycitaischofpplyITH Хронологија断FILESFILESettenERRomas断JECTuntoindreПетер CavalizersischofITHahl Caval断etten welcomeJECT halJECTolaimas MARolu断terracherITHJECTuntoicos hal Stevensetten welcomeimaspplyLENGTH marcJECT marcimasimasПетерombres reciolaolaolaischofola断断JECTimasischofimas haletten断untountoombresimasJECT ProofERRITHetten Caval icFILESoluischofJECTolu marcola welcomeindreterra halischofolaatoreJECTimascita marcoluERRetten Stevenscherimasola hal Caval断imasischof

@silibattlebot
Copy link

You can get gibberish pretty easily if you go over the default context limit (n_ctx = 512).

@Drael64
Copy link
Author

Drael64 commented May 4, 2024

I can confirm that if you requant the gguf from scratch it fixes this issue with older models too (like my solar finetune). So it does appear to be the case that the problem is not a bug per se but rather 'the newest version of llamacpp makes all historical GGUFs broken'

@navr32
Copy link

navr32 commented May 4, 2024

Yes ok i have update the model. This model don't give any erros if run on cuda even if with ctx 16512, but whith the vulkan backend yes it is better after the model upgraded and limit the ctx but again giberrish after more question ask.

@SomeOddCodeGuy
Copy link

SomeOddCodeGuy commented May 5, 2024

Someone on reddit asked me on a post about Mac speeds after flash attention whether I was experiencing any odd gibberish issues, so I decided to test it out. Here are my results using KoboldCpp 1.64 on my M2 Ultra Mac Studio (my setup makes it challenging to run this same test directly against llamacpp on the mac):

Mixtral 8x7b Instruct v0.1 sending 15k context:

  • Flash attention on: gibberish response
  • Flash attention off: Valid Response

OpenHermes-2.5 Mistral 7b sending same 15k:

  • Flash attention on: gibberish response
  • Flash attention off: gibberish response

Nous Capybara 34b sending same 15k:

  • Flash attention on: Valid Response
  • Flash attention off: Valid Response

Midnight Miqu 70b v1.5 sending same 15k:

  • Flash attention on: Valid Response
  • Flash attention off: Valid Response

NOTE: Again, these are from Koboldcpp, so it could be that the above issue is actually a Kobold issue. But I wanted to add info in case it might help.

Additional Note: None of the models in question gave me the reduced quality warning on load. Im familiar with the error as I got it on older quants of Llama 3 and Command-R, but these were not giving it.

One more note: I meant to mention that I did try Mixtral 8x7b Instruct v0.1 with flash attention at a lower context (around 6k) and it was just fine,.

@turian
Copy link

turian commented May 6, 2024

This is related to #7049, I believe

I think it's a poor solution to say: Requant old stuff.
There are a lot of HF models people might have only in GGUF and have no idea if they are "too old".

The correct solution would be that llama.cpp exits with an error (which MAYBE you can override) if it detects that the GGUF tokenization no longer will match the HF AutoTokenizer tokenization.

@ggerganov
Copy link
Owner

There are at least 4 different issues discussed here - just reopen with specific llama.cpp repro instructions

@spydaz
Copy link

spydaz commented May 7, 2024

I first encountered this problem after upgrading to the latest llamaccp in silly tavern. It would generate gibberish no matter what model or settings I used, including models that used to work (like mistral based models). It was confusing because the models generate normally in kobold lite. I thought it was a SillyTavern problem.

Then I pasted a purely coherent text long chat into the prompt in kobold lite (over 6k tokens), and it gave gibberish like "(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr" or meaningless strings of characters and words just like the bug in SillyTavern, and likewise if I deleted the character card, authors note, and strong string from SillyTavern it would generate coherently and normally.

So this appears to be llamacpp and something to do with a fuller or longer context.

I'm using vulkan with fairly standard settings on windows 11. Doesn't matter what kind of base model I use, or any other settings. Basically everything that previously worked before I updated, no longer works (unless I trim the context down to nothing, but that makes it pretty useless). I use a max context size of 6144 if that matters, so it's never larger than that.

yes the same was happening to me in lmstudio which also uses llama_cpp... :
but when i used GPT4ALL the models worked fine!

i was very sceptical where the problem was as i did not have it on transformers! << so it was here !! >> now you mention the long context (i even dropped this from my models ---- << it was this problem of the yarn embeddings ! >>> Rope >>> not working correctly 1 << hopefully LM-studio will comeback working !

@spydaz
Copy link

spydaz commented May 7, 2024

hi ! I have gibberish too with vulkan backend too. Even if i load other models. Just ask some 1 to 3 or sometimes 4 questions and start to gibberish without stop untils i stop with keyboard "ctrl c". In the example below just one ask ..the model start to reply and become gibberish. Thanks to all.


 ./main -ngl 41 -cml -t 14  -m /llama-2-13b-chat.Q8_0.gguf --log-enable
Log start
main: build = 2781 (6ecf3189)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed  = 1714773679
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from llama-2-13b-chat.Q8_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   4:                          llama.block_count u32              = 40
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 13824
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 40
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 40
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0,000010
llama_model_loader: - kv  10:                          general.file_type u32              = 7
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0,000000, 0,000000, 0,000000, 0,0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q8_0:  282 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 40
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 5120
llm_load_print_meta: n_embd_v_gqa     = 5120
llm_load_print_meta: f_norm_eps       = 0,0e+00
llm_load_print_meta: f_norm_rms_eps   = 1,0e-05
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: f_logit_scale    = 0,0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000,0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 13B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 13,02 B
llm_load_print_meta: model size       = 12,88 GiB (8,50 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon RX 7900 XTX | uma: 0 | fp16: 1 | warp size: 64
llm_load_tensors: ggml ctx size =    0,37 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:        CPU buffer size =   166,02 MiB
llm_load_tensors:    Vulkan0 buffer size = 13023,85 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:    Vulkan0 KV buffer size =   400,00 MiB
llama_new_context_with_model: KV self size  =  400,00 MiB, K (f16):  200,00 MiB, V (f16):  200,00 MiB
llama_new_context_with_model: Vulkan_Host  output buffer size =     0,12 MiB
llama_new_context_with_model:    Vulkan0 compute buffer size =    85,00 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =    11,01 MiB
llama_new_context_with_model: graph nodes  = 1286
llama_new_context_with_model: graph splits = 2

system_info: n_threads = 14 / 24 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
main: interactive mode on.
Reverse prompt: '<|im_start|>user
'
sampling: 
	repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000
	top_k = 40, tfs_z = 1,000, top_p = 0,950, min_p = 0,050, typical_p = 1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 17


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

<s> <|im_start|>system
<|im_end|>
> Speak me about wavelength ?
 Wavelength is a fundamental concept in physics that refers to the distance between two successive peaks or troughs of a wave. It is usually measured in meters or nanometers, and it determines the size of the wave.

In the context of light, wavelength is related to the color of the light. Different wavelengths of light correspond to different colors, with shorter wavelengths appearing blue and longer wavelengths appearing red. For example, visible light has wavelengths between approximately 400 and 700 nanometers, which corresponds to the colors of the visible spectrum, from violet to red.

Wavelength is also important in other fields, such as audio, where it determines the pitch of a sound, and in oceanography, where it is used to describe the size of ocean waves.

Is there anything else you would like to know about wavelength ?

álvaПетерpplyПетерischofomas welcomeetten断cher halERRolacherITH reciERRJECT CavalcitaLENGola断pperscherischofITH断 icJECT hal CavalJECT welcomecitaettenLENGПетерischofJECT ProofomasFILESpplyischof icuntoatore MAR icomasálvacher CavalERR Cavaletten halimasischofoluoluolu halcher hal断omasahlola断ERRombres marcahlcheromas ic Хронологија halomasálvaJECTERR断ettenettencherischofizersuntoettenischofizerscherERRolaПетер Stevenspply Cavalomas welcomepply Stevens halimas断LENGettenJECTLENGTH Stevens marc welcomeischofischofuntoПетер断 icJECT icischofcitaolapplyimasПетер断LENG MARimas Caval ProofcherERR Caval hal Caval marcischofJECT welcomeoluLENGTHolu StevensolacitaJECT Хронологија reci MARERRITHJECTITHppersettenLENGTH ProofПетерПетер iccita断 welcome断olupply recipply Cavaletten reciálva welcome断ERRppersimas welcomeppersoluombresimasálvaischofLENGTH reciálvaischof Caval halizers marcERRindre reci断pplyJECToluizers Хронологијаimasombres welcome StevensuntoahlITH reciJECTJECT CavalERRindreLENGTHITHizersischof haloluLENGTH断álvaomasimasischoficoscherimasoluischofppersindreola MAR CavalindreischofERRizersLENGTHicosJECTterraLENGTHetten Stevensolaálvaolaischof断 Хронологијаizersomasicos CavalombresITH MARombres icolaomasizers reciizers reci MAR hal marc ProofcitaizersombrescherizersITHuntountochercher CavalischofischofatoreizersFILESПетерizersomasettenFILESettencher reci recioluПетер断ERRpplyJECTimasFILESola断pplyizersomasindre断omas断oluoluettenizers断unto reciischof CavalLENGERRJECTLENGTH CavalterraLENGischofizers断LENGTH断断 ProofFILESálvaERR StevensERRПетерLENGcherpply icITHJECTischofombres marcischof MARcitaizersizerspply断断ettenatore CavalLENGПетерoluombresПетерoluppers MARcherimas Proof icterraahl CavalERRcher Cavalppersppers welcome hal Caval MARimasolu hal MARahl MARomas Хронологијаppersombrescheratore welcomeicosettenomasomasERR StevensimasettenJECT StevensindreizersuntoischofLENGálvaicosischofuntoITH icFILES断imasombres Proofunto welcomeppers marc Stevens断 CavalFILES icola reci Cavalicoscita Хронологијаomas welcomecitaFILES icolaLENG marc welcomecitaischof ProofcherFILESERRatoreomas marcolacher Stevens Stevensombresterracher halERR StevensettenizersLENGTHettenuntoFILESombresischofomasizersoluolaombresizersombresindre hal MARindrecher reci marc welcome CavalITHicos marccher断LENGischofПетер Caval MARizers断 halERRicos icERRLENGTHombresischofПетерicoscherterraJECTJECTJECTizersppersombresálvaJECT reci hal CavalizersLENGischofITHettenПетерpplyLENGTH ProofLENGcherERRicosERRischofПетерLENGTHuntountoJECT hal reciolu icola halolaoluLENGTHcher Cavalterraomasimasolu ProofLENGTHuntoálvaJECTolu MARischofcita marcicosolupplyischofITHindreERRizers Stevensischof Хронологија Proof断olaizersettenLENGetten Хронологија Proof Proofterraombresicosettenettenolu marcERRERRERRomasERRizersischof MARoluischofERRunto halterraITHálva断断ITHERR icppers断ERRindre halombres Хронологија halERRLENGettenterra断FILESolu halombresoluppers marcola Cavalunto halERRJECTcitaischofischof StevensppersLENGomasITH MARLENGcher Proof断 ic断断ahlicos CavalischofischofПетерálvaomasomascher hal reciimasischof marcischofindreLENGTH welcomeicos ProofLENGTH welcomeizerspply Caval Stevensomasicos reci CavalischofПетерcher Proof断 ic reciizersolaLENGischof Stevensicosomascitaetten welcome MARomasLENGTHERRetten断icosJECTola marcoluetten icpplycitaischofpplyITH Хронологија断FILESFILESettenERRomas断JECTuntoindreПетер CavalizersischofITHahl Caval断etten welcomeJECT halJECTolaimas MARolu断terracherITHJECTuntoicos hal Stevensetten welcomeimaspplyLENGTH marcJECT marcimasimasПетерombres reciolaolaolaischofola断断JECTimasischofimas haletten断untountoombresimasJECT ProofERRITHetten Caval icFILESoluischofJECTolu marcola welcomeindreterra halischofolaatoreJECTimascita marcoluERRetten Stevenscherimasola hal Caval断imasischof

exactly this (i deleted many model thinking i had trained them wrong) ... so now i test in colab first before converting ! (it may even be a problem with conversion?? ) (Q4_K_M/S).... (Q8) -NO PROBS

@navr32
Copy link

navr32 commented May 16, 2024

I just come to update and rebuild to last git and now all test i have done work with vulkan backend with any models even if says degraded output. Thanks to all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants